WHAT A CHANGE!

A research on the implications and phylogeny of single nucleotide polymorphisms (SNPs) in Alzheimer’s and Parkinson’s diseases

Joana Bordas Solé Carla F. Cadenet Cecília Vila Riera

December 2015 2nd Batx Escola Sant Gregori Tutor: Begoña Vendrell Simón

2

3

To those who suffer, have suffered or will suffer from a neurodegenerative disease. All the effort and time we have devoted on this project is for them, so one day they will have a better life.

4

“The total amount of suffering per year in the natural world is beyond all decent contemplation. During the minute that it takes me to compose this sentence, thousands of animals are being eaten alive, many others are running for their lives, whimpering with fear, others are slowly being devoured from within by rasping parasites, thousands of all kinds are dying of starvation, thirst, and disease. It must be so. If there ever is a time of plenty, this very fact will automatically lead to an increase in the population until the natural state of starvation and misery is restored. In a universe of electrons and selfish , blind physical forces and genetic replication, some people are going to get hurt, other people are going to get lucky, and you won't find any rhyme or reason in it, nor any justice. The universe that we observe has precisely the properties we should expect if there is, at bottom, no design, no purpose, no evil, no good, nothing but pitiless indifference.”

Richard Dawkins, River Out of Eden: A Darwinian View of Life

5

Acknowledgements

The success and final outcome of this project required a lot of guidance and assistance from many people. In the following paragraphs we would like to take a minute to thank them.

To begin with, this project could not have been done without the previous work of all the scientists that have shared their discoveries, so we would like to thank them all.

We would like to especially thank Dr. Gerard Muntané and Dr. Gabriel Santpere who took the time to explain the basics of genetics to us and guide us throughout all the project. Apart from that, Dr. Santpere posed this research challenge to us, which was the starting point of our project.

We would also like to thank Dr. Josep Prous, who kindly made some time to share his research with us. Moreover, we are also grateful to our English teacher, Miss Usoa Sol, who made an enormous effort to correct this project in spite of not being her field of expertise.

Last but not least, we would like to especially thank our amazing tutor, Begoña Vendrell, for giving us the opportunity to do such an uncommon project. She not only helped us with all the questions about the project but she also encouraged us during all the process. Her energy and enthusiasm in this project inspired us to keep on working at our best. It is undeniable that we could not have done it without her.

6

Abbreviations

A: Adenine. AA or aa: . AD: Alzheimer’s Disease. Ala (A): Alanine. APOE: Apolipoprotein E. AP: Proteic part. APP: Amyloid precursor . Arg (R): Arginine. Asn (N): . Asp (D): Aspartate. ATP13A2: Probable cation-transporting ATPase 13A2 (PARK9). C: Cytosine. CDS: Coding sequence.cDNA: coding Deoxyribonucleic acid. CNS: Central Nervous System. Cys (C): Cysteine. DJ1: Protein deglycase DJ-1. (ENS)EMBL: European Molecular Biology Laboratory. EOAD: Early-onset Alzheimer's Disease. EOPD: Early-onset Parkinson's Disease. FAD: Familial Alzheimer’s Disease. FBXO7:F-box only protein 7. G: Guanine. Gln (Q): Glutamine. Glu (E): Glutamate. Gly (G): Glycine. His (H): Histidine. Ile (I): Isoleucine. LB: Lewy Bodies. Leu (L): Leucine. LN: Lewy Neurites. LRRK2: Leucine-rich repeat serine/threonine-protein kinase 2. Lys (K): Lysine. Met (M): Methionine. mRNA: messenger ribonucleic acid. ND: Neurodegenerative Disease. OMIM: Online Mendelian Inheritance in Man.

7

PARK2: (ligase). PD: Parkinson’s Disease. Phe (F): Phenylalanine. PINK1: PTEN-induced putative kinase 1. PLA2G6: 85/88 kDa calcium-independent phospholipase A2. Pro (P): Proline. PSEN 1: Presenilin 1. PSEN 2: Presenilin 2. RNA: Ribonucleic acid. Ser (S): Serine. SNCA: Alpha synuclein SNP: Single Nucleotide Polymorphism T: Thymine. Thr (T): Threonine. Trp (W): Tryptophan. Tyr (Y): Tyrosine. U: Uracil. UCSC: University of California, Santa Cruz (Genome Browser). VPS35: Vacuolar protein sorting-associated protein 35. Val (V): Valine.

8

CONTENTS

0- Introduction and main objectives of this piece of work 13 1- Introduction to molecular aspects of neurodegenerative diseases 16 1.1 Neurodegenerative diseases 16 1.1.1 16 1.1.1.1 Alzheimer’s Disease (AD) 18 1.1.2 α-synucleinopathies 20 1.1.2.1 Parkinson’s Disease (PD) 20 1.1.3 Comparison between AD () and PD (synucleinopathy) 22

1.2 Implication of in diseases 23 1.2.1 Penetrance 24 1.2.2 SNP Mutations and : the dN/dS ratio 24

1.3 Protein expression and its implication in diseases 26 1.3.1 Deoxyribonucleic acid 26 1.3.2 of DNA and formation of the mRNA 27 strand 1.3.3 mRNA 28 1.3.4 Amino acids 28 1.3.5 involved in EOAD and EOPD 31 2- Working questions and hypothesis 34 3- Methodology 36 4- Results and discussion 61 4.1 Mutations leading to EOAD and comparison with animals’ 62 genes 4.1.1 APP gene 62 4.1.2 PSEN1 gene 66 4.1.3 PSEN2 gene 74 4.2 Mutations leading to EOPD and comparison with animals’ 75 genes 4.2.1 DJ1 gene 75 4.2.2 PLA2G6 gene 77 4.2.3 PINK1 gene 80 4.2.4 VPS35 gene 83

9

4.2.5 SNCA gene 84 4.2.6 ATP13A2 gene 86 4.2.7 FBXO7 gene 88 4.2.8 LRRK2 gene 90 4.2.9 PARK2 gene 93 4.3 Quantification of synonymous and non-synonymous changes 95 4.4 dN/dS ratio 97 4.5 Number of mutations and phylogenetic distance 99 5- Conclusions 102 6- Future research 104 7- Glossary 105 8- References: bibliography and webgraphy 109 9- Annexes 111 9.1 Geographic origin of the studied AD mutations 111 9.2 Geographic origin of the studied PD mutations 112 9.3 CD-rom of supplementary materials 113

10

Abstract

This project studies the two most important neurodegenerative diseases: Alzheimer's and Parkinson's. It focuses on the genetic aspect of these diseases, studying the genes that codify for the proteins affected in these pathologies. These are APP, PSEN1 and PSEN2 for AD, and LRRK2, PLA2G6, DJ1, PINK1, ATP13A2, PARK2, FBXO7, SNCA and VPS35 for PD.

Our objectives were to analize the SNP mutations that lead to the malformation of the proteins with 100% penetrance leading to EOAD and EOPD and then compare the codons affected with some animals' genome.

As for the study of the mutations, we found that they are negative selected and that the change of amino acid does not have to be an extreme change, because sometimes even if the change is minimum it can lead to the malformation of the protein. This is because what matters most about a protein is its function and therefore the tertiary structure.

When doing the alignments with the animals' genome, we found that there are some animals that present the that in lead to EOAD or EOPD but they do not have the disease. Why they do not have the disease is one of the questions we could not answer but we formulated some hypotheses. We also found that the animals futher away from humans in the phylogenetic tree present more differences in the codons involved.

Este proyecto aborda las dos enfermedades neurodegenerativas más importantes: Alzheimer y Parkinson. Se centra en la parte genética de estas enfermedades, estudiando los genes que codifican por las proteinas afectadas en estas patologías. Estos son APP, PSEN1 y PSEN2 para el Alzheimer, y LRRK2, PLA2G6, DJ1, PINK1, ATP13A2, PARK2, FBXO7, SNCA y VPS35 para el Parkinson.

Nuestros objetivos eran analizar las mutaciones que derivan en la malformación de las proteinas afectadas con 100% de penetrancia en el Alzheimer y el Parkinson, y luego comparar los codones afectados con el genoma de algunos animals elegidos.

Respecto el estudio de las mutaciones, encontramos que están seleccionadas negativamente y que el cambio de aminoácido no tiene porque ser un cambio

11 extremo, a veces incluso un cambio mínimo puede causar la malformación de la proteína. Esto es así ya que lo realmente importante es la función de la proteína, es decir, la estructura terciaria de esta.

Cuando llevamos a cabo los alineamientos con los genomas de los animales, encontramos que algunos animales tienen las mutaciones que en humanos causan EOAD o EOPD pero no presentan las enfermedades. Por qué no tienen las enfermedades es una pregunta que no pudimos responder pero formulamos algunas hipótesis. Hemos podido comprobar también que los animales que se encuentran más lejos en el árbol filogenético presentan más diferencias en los codones involucrados.

Aquest treball de recerca aborda les dues malalties neurodegeneratives més importants: l'Alzheimer i el Parkinson. L'estudi es centra en la part genètica d'aquetes malalties, estudiant els gens que codifiquen per les proteïnes afectades en aquetes patologies. Aquets són APP, PSEN1 i PSEN2 en l'Alzheimer, i LRRK2, PLA2G6, DJ1, PINK1, ATP13A2, PARK2, FBXO7, SNCA i VPS35 en el Parkinson.

Els nostres objectius eren analitzar les mutacions que deriven en la malformació de les proteïnes afectades en l'Alzheimer i el Parkinson, i després comparar els codons afectats amb el genoma d'alguns animals escollits.

Respecte l'estudi de les mutacions, trobem que estan seleccionades negativament i que el canvi d'aminoàcid no té perquè ser un canvi extrem, a vegades fins i tot un canvi mínim pot causar la malformació de la proteïna. Això és així ja que el que realment importa és la funció de la proteïna, és a dir, la seva estructura terciària.

Quan vam dur a terme els alineaments amb el genoma dels animals, vam trobar que hi ha alguns animals que tenen els codons de les mutacions que en humans causen EOAD i EOPD però no presenten les malalties. Per què no la presenten és una pregunta que no em pogut contestar però hem formulat algunes hipòtesis. Hem pogut comprovar també que aquelles espècies més llunyanes filogenèticament presenten més diferències en els codons involucrats.

12

0- Introduction and main objectives of this piece of work

The reason why we chose this project is, mainly, that we enjoy studying scientific topics in general, and our idea is to study something related to Biomedical Sciences when we reach University. From the start, we wanted to do something related with neurodegenerative diseases, as they affect a lot of people around the world and their cure is currently unknown. In addition, apart from the fact that not many things are known about this field, it is also extremely wide, and new discoveries are made on a periodic basis because of the complexity of our . Not only are these diseases due to numerous different causes, some of which are determinant and some of which are due to interactions between risk factors, but they are also expressed in a different way in every person. But as they are still relatively unknown this made them more interesting and appealing than other diseases to us. So that is why we took a different approach to this project, leaving aside most social aspects.

In this project we have chosen to focus on the genetic and familial cases of two well-known neurodegenerative diseases, as are Alzheimer’s disease (AD) and Parkinson’s disease (PD).

AD and PD diseases are really common in our society. According to the US, 1.6% of the general population has AD, but 19% have it when they are older than 79 years old, and this increases up to almost 50% with people over 80 years of age1. When it comes to PD, the disease usually appears between the ages 60 and 80 years old, and 1 to 2% of the population over 60 years old presents the disease. However, this field of research is really wide and unknown, so we made our choice to focus just on less than the 10% of the cases, that is to say, those caused genetically and which do undoubtedly determine the early-onset familial cases. They usually appear around the age of 45, though it can be earlier.

Our main goal was then, firstly, to acquire more knowledge about these two diseases, AD and PD, and about genetics in general. Starting from here, we then established different goals once we had identified, studied and classified all the different mendelian mutations by means of gathering all the related information in worksheets, which are the working basis of our project.

1 It has been extracted from the National Institute of Aging

13

In order to build the worksheets, though, we had to follow a series of steps, as it is not that straight forward, so we have been constantly adding new data and thus establishing new goals throughout the project. After focusing on this specific genetic causes of the diseases, our first question was what exactly causes either EOAD or EOPD, and after finding out that there were different genes that contained those mutations, and that those mutations affected only a single nucleotide, we looked for all these types of mutations and their variants in the corresponding genes. Afterwards, we looked for information about the different mutations, such as their frequency in the population, the damage they may cause, their location in the genome, their geographic origin, the nucleotide’s change and other valuable information.

We also decided to make a parallel research to try to find out if it scientifically stated that animals do (or do not) present early-onset Alzheimer’s disease (EOAD) or early-onset Parkinson’s disease (EOPD), so the last goal mentioned in the previous paragraph (obtaining the complete worksheets) is actually the most important in order to solve this question, as we had to compare the humans’ sequence with the sequences of the other animals we had chosen in order to see if each specific mutation was found in their genomes, too, and if they then develop the disease. This genome comparison was performed through the alignment of the chosen genome sequences and the latter comparison of the alignments.

We also wanted to take a closer look to the changes in the nucleotides, to see if they were synonymous or not, and see if they transcript into an amino acid of a different group, as we would think then that the protein’s structure would be more affected, or to one from the same group and, therefore, with the same properties. Our guess was then that the change in the amino acid would be from different groups, as it should be major in order to cause these two diseases.

But why focus our work from this perspective, looking at genomes? A lot of projects have been made of AD and PD, but we do not have any precedent that focuses on the genetic part as we have done. It was then that Dr. Gabriel Santpere2 proposed us this project on December 12th 2014, when we arranged a meeting with him at PRBB (Parc de Recerca Biomèdica de Barcelona), where he worked at the time, and we decided to rise to the challenge. This is the most important reason

2 Dr. Santpere, was who originally challenged us to compare the SNPs of EOAD and EOPD with the animals’ sequence. We accepted this challenge. For further information about Dr. Santpere, go to: http://www.researchgate.net/profile/Gabriel_Santpere

14 why we chose this project and how we came up with the idea to research the question whether animals can or cannot present these diseases.

The first objective of this project is to focus on small-scale mutations that may occur in genes related to AD and PD in humans and compare them to animals.

Once we have collected all the information about all the mutations occurring in those genes, including codon change, amino acid change, nationality of the patient expressing this mutation, damaging, coordinates and a sequence of 21 nucleotides (7 codons) surrounding the mutation; our next objective is to compare these sequence with the same region of DNA in some animals of our choice. This process is called alignment. The animals we chose are: gorilla, , orangutan, mouse, horse, pig, olive baboon, vervet-AGM, marmoset and macaque.

Finally, our next objective is to analyze the change of the codon in humans and compare it with our selected animals. In theory, animals should have the same codon that humans present but we can find some animals that have the codon of the mutation that leads to either AD or PD in humans.

15

1- Introduction to molecular aspects of neurodegenerative diseases

1.1 Neurodegenerative diseases (ND)

Neurodegeneration is the name given to the loss of neurons’ function and structure. There are two causes of : genetic inheritance, which represents 10% of the total cases of ND, and the other 90% is believed to be caused by age and external factors yet to be stated (what is called sporadic cases). However, some of these external factors are already known, such as pollution, viruses or prions, heavy metals, oxidation, nutrition (lack of vitamins) and, most importantly, age.

Neurogenerative disorders (ND) are frequently related to the aggregations of some vulnerable peptides located in the cytoplasm or the nucleus of the neurons or their proximities, forming protein or peptide accumulations. They are known as proteopathies. The accumulation of such proteins can be caused by different factors as: mutations, an inappropriate expression, an abnormal folding of the protein, post-translational mutations, oxidative stress, exposure to toxic external factors or abnormalities during the proteolytic process. This results in the decrease or loss of their function. Depending on the protein that is part of the aggregates, they can be classified in: tauopathies and synucleinopathies.

1.1.1 Tauopathies

Tauopathies are a group of diseases that have a specific pathologic fact in common: the presence of protein Tau aggregates abnormally phosphorylated in the inside of neuronal cells. The main disease in this group is Alzheimer’s Disease (AD).

Tau protein abounds in Central Nervous System (CNS), and it is found enriched in the neurons’ axons. Its location is basically cytosolic although we can also find it associated with the cells’ membranes. It is a protein associated with microtubules. Its function is to promote the assemblage and stabilization of microtubules, so it is crucial in neurogenesis and the axons’ transport. The Tau gene is located in the 17th and it contains 16 .

When the balance between phosphorylation and dephosphorylation is altered, the Tau protein gets irreversibly hyperphosphorylated. The result of this is the

16 separation of Tau from microtubules. Its phosphorylation leads Tau to fibrillate and compact, forming neurofibrillary tangles and other unwanted aggregations.

A common feature observed in AD is the presence of β-amyloid plaques in brain tissue. β-amyloid plaques are formed by the accumulation of extracellular cerebral β-amyloid. There are three main components of these amyloid structures. Two of them are found in all amyloids. These are the AP (proteic part) and mucopolysaccharides. The third component is specific for each amyloid. In our case, the specific component is amyloid peptide, derived from the proteolysis of the precursor protein of the amyloid peptide (APP).

The APP is found in the cell membrane; it contains over 700 amino acids and its function is yet to be determined. Its proteolysis can be generated by three enzymes called α, β, γ-secretases. If it is broken down by α and γ, the resulting peptide is not amyloidogenic. However, if it is broken down by β and γ, the resulting peptide is amyloidogenic. Three kinds of amyloid plaques can be differentiated according to its function and its compaction level.

Image 1: tertiary structure of the protein β-amyloid3.

3 Extracted from “”.

17

1.1.1.2 Alzheimer’s Disease (AD)

AD is the most frequent cause of dementia in humans, and its major risk factor is age. It is a chronic ND, specifically a tauopathy. Clinically, the most common early symptom of AD is the inability to remember recent events. AD is characterized by a cognitive deficit which includes the loss of language, memory, and both perceptive and functional abilities.

The disease was discovered in 1901 by Alois Alzheimer, a German psychiatrist who identified the first case of, what nowadays is known as, Alzheimer’s disease in a fifty-year-old woman, Auguste D. She finally died in 1906.

While macroscopically AD presents a cerebral atrophy in the temporal lobe leading to a progressive neuronal loss, from a more biological or neuropathological point of view, AD is defined or characterized by the appearance of two types of protein aggregates: Tau protein (especially in neurons and glial cells) and β-amyloid plaques, which accumulate in some brain regions. In order to develop the disease, the Tau protein must have been hyperphosphorylated.

Two types of AD can be distinguished: Early-onset Alzheimer’s Disease (Familial AD) and Late-onset Alzheimer’s Disease (Sporadic cases).

Familial Alzheimer's disease (FAD) is a form of early-onset Alzheimer's, and it is inherited and rare. It affects less than 10% of Alzheimer's disease patients, but its importance lies in the fact that it is genetic. FAD develops before age 65, even in people as young as 35. It is caused by mutations in some genes.

The majority of Alzheimer's disease cases are late-onset, usually developing after age 65. Late-onset Alzheimer's disease has no well-known cause and shows no obvious inheritance pattern. However, in some families, clusters of cases are detected. Although a specific gene has not been identified as the cause of late- onset Alzheimer's disease, genetic factors do appear to play a role in the development of this form of the disease and may act as risk factors4.

34 types of tau mutations have been found that lead to diseases. However, none of them cause FAD. By contrast, these mutations are related to other mutations in the

4 Malformations in the proteins apoE or Alpha-Synuclein have also been determined as risk factors that may lead to AD.

18 genes of other proteins such as APP, Presenilin 1 and Presenilin 2, which increase the deposition of β-amyloid peptide.

The phenotype of patients suffering from familial AD is indistinguishable from that of those suffering sporadic AD.

Image 2: As the disease progresses, the affection patron containing tau deposits spreads all over the brain. (Image extracted from Dr. Muntané’s PhD thesis)

There are many hypotheses which try to account for the process by which amyloid plaques are formed and Tau protein is hyperphosphorillated. The main theory is called amyloid cascade hypothesis. This theory proposes that excessive accumulation of β-amyloid is the key event in AD: this accumulation sets off a series of events that results in the death of brain cells, and eventually, in AD.

β-amyloid is formed from a large protein called amyloid precursor protein (APP), which is codified by the APP gene. Researchers do not exactly know the function of this protein yet, but it is known that it is related to the activity of brain cells. Special enzymes called secretases, which are codified by the genes PSEN1 and PSEN2, cut this protein at specific sites and one of the resulting products is the amyloid-β peptide. These amyloid-β peptides unite forming oligomers, and according to the amyloid-β hypothesis, it is these oligomers that are toxic to brain

19 cells, causing AD. Later, these oligomers form the plaques that are characteristic of AD.

1.1.2 α-synucleinopathies

α-synucleinopathies cover a group of ND with a common injury: aggregates of the protein α-synuclein in some populations of neurons. One such disease is Parkinson’s Disease (PD).

α-synuclein is a protein made out of 140 amino acids, and particularly found in synaptic terminals. Its function is currently unknown, but it is thought to be linked to synaptic vesicle transport. This protein is especially linked to PD. It is controlled by the gene SNCA, found in chromosome 4.

In the aggregates of α-synuclein we can find the protein folded in an incorrect way and with post-translational modifications.

1.1.2.1 Parkinson’s disease (PD)

Parkinson Disease is also a ND, in particular a synucleinopathy. It is the second most important ND, right after AD. PD’s clinic symptoms are muscular stiffness, akinesia, trembling while resting and postural instability. Two types of PD can be distinguished: Familial PD, which represents less than 5% of the cases and usually appears before age 45 (it is therefore early-onset PD (EOPD)), and sporadic cases, which refer to the other 95% and are related to a more advanced age.

The disease was named in honor of James Parkinson, an English doctor that in 1817 published an essay reporting six cases of paralysis agitans. In it, he described the characteristic resting , abnormal posture and gait, paralysis and diminished muscle strength, and the way that the disease progresses over time.

Biologically, it may be defined as the selective loss of over 60% of dopaminergic neurons in the black substance, which results in motor disorder.

Macroscopically, to diagnose a brain affected by PD, we have to take a look at the black substance. As a result of the specific loss of neurons, the usual pigmentation of the brain turns pale in the affected zone. When the neuronal loss in the black

20 substance surpasses the 60% of the total, the first symptoms of the disease appear.

Microscopically, PD is defined by the appearance of Lewy bodies (LB) in the cytoplasm of neurons and Lewy neurites (LN) in the axons. Lewy bodies are abnormal aggregates of proteins that develop inside nerve cells in disorders such as PD, dementia, and some others. They are mainly formed by wrongly- folded and post-translationally modified α-synuclein aggregates. Together with LB, we can also find another structure called pale bodies (which are in fact believed to produce the LB).

Image 3: Progressive summary of the affected zones in PD. (Image extracted from Dr. Muntané’s PhD thesis).

Image 4: Comparison between the black substance of a patient suffering PD (right) and the one from a person no affected by the disease (left). We can find an important decrease of pigmentation, which is macroscopically observable (Image extracted from Dr. Muntané’s PhD thesis).

21

1.1.3 Comparison between AD (tauopathy) and PD (synucleinopathy)

We have seen that AD and PD are both ND but the first one is catalogued as a tauopathy and the second one as a synucleinopathy. There some main differences between these two disorders, which are summarised in the following table:

CHARACTERISTICS ALZHEIMER PARKINSON Type of ND Tauopathy Synucleinopathy Protein aggregations: Tau α-synuclein (Lewy Bodies) where? Amyloid peptide Risks Familial AD (10%) or Familial PD (5%) or Sporadic cases (90%) Sporadic cases (95%) Symptoms ● inability to ● motor disorder remember recent events Consequences ● loss of language ● muscular stiffness ● memory loss ● akinesia ● loss of perceptive ● trembling while resting and functional ● Postural instability. abilities Table 1: Brief comparison between PD and AD5.

5 Font: personal elaboration. The information has been extracted from several pages mentioned in the webgraphy. All further tables have been personally elaborated as well.

22

1.2 Implication of gene mutations in diseases

Mutations are random and sporadic alterations of the genetic information and they are knowingly caused by radiation or chemical mutagens. However, most times the cause is unknown. Mutations may or may not produce discernible changes in the observable characteristics of an organism (phenotype). The sequence of a gene can be altered in a number of ways. That is why we can distinguish small-scale mutations, the ones affecting a gene in one or a few nucleotides, and large-scale mutations chromosomal structure. In this research only small-scale mutations (in particular, SNPs) affecting coding regions will be dealt with.

Small-scale mutations include 3 different types; namely point mutations, insertion and deletion.

Point mutations are substitutions, so they exchange a single nucleotide for another. These changes are classified as transitions or transversions. The transition exchanges a purine for a purine (A  G) or a pyrimidine for a pyrimidine, (C T) and the transversion exchange a purine for a pyrimidine or a pyrimidine for a purine (C/T  A/G).

Point mutations that occur within the protein coding of a gene may be classified into three kinds, depending on which amino acid the new codon: silent mutations6, which code for the same amino acid; missense mutations, which code for a different amino acid and nonsense mutations, which code for a stop amino acid and can truncate7 the protein.

In insertions one or more extra nucleotides are added into the DNA. Insertions in the coding region of a gene can significantly alter the gene product. In deletions one or more nucleotides are removed. Like insertions, these mutations can alter the reading frame of the gene.

The smallest mutations that may occur are called single nucleotide polymorphism (SNP). They are DNA sequence variations occurring in less than 1% of the population in which a single nucleotide (A, T, C or G) in the genome differs between members of a biological species or paired . For example, if one

6 This only refers to mutations affecting to the coding region. Synonymous mutations are also also silent mutations, but occur in a non-coding region of the gene (introns) so they will not cause any damage. 7 Truncate: Cut off; cut short; maimed

23 fragment of a DNA from one individual is AAAGTCTTA and the same fragment of another individual is AAAGTATTA, the nucleotide C has been replaced by an A. when this happens we say the there are two variant alleles8.

1.2.1 Penetrance

Penetrance refers to the proportion of individuals suffering a mutation which causes a particular disorder, who express clinical symptoms of such disorder. When a mutation occurs, a symptom is not always expressed. For instance, if out of 100 individuals suffering a mutation only 30 express clinical symptoms of the disorder related to this mutation, the penetrance will be of 30%. When all individuals suffering the mutation express its symptoms, the penetrance will be of 100%. This project focuses on SNPs with a 100% penetrance.

1.2.2 SNP mutations and evolution: the dN/dS ratio

In genetics, dN/dS ratio is the proportion between the number of nonsynonymous (N) substitutions to the number of synonymous (S) substitutions than occur in a gene when it has suffered a SNP. This ratio can be used as an indicator of evolution. In other words, the dN/dS ratio is used to infer the direction and magnitude of natural selection acting on protein coding genes.

dN (alternatively designated Ka) is a measure of the degree to which two homologous coding sequences differ with respect to amino-acid content. Specifically, it indicates the degree to which two sequences differ at non- synonymous sites (nucleotide sites at which a substitution causes an amino-acid change). Formally, dN is the average number of nucleotide differences between the sequences per non-synonymous site.

dS (alternatively designated Ks) is a measure of the degree to which two homologous coding sequences differ with respect to silent nucleotide substitutions (substitutions that do not cause an amino-acid substitution). It indicates the degree to which two sequences differ at synonymous sites (sites at which a substitution

8 Allele: one of a number of alternative forms of the same gene or same genetic .

24 does not cause an amino-acid substitution). Formally, dS is the average number of nucleotide differences between sequences per synonymous site9.

When dividing dN by dS, the result can be:

1. If dN/dS = 1, amino-acid substitutions may be largely neutral. However, there is also the possibility that positive selection just cancels purifying selection, so that some amino- acid substitutions were driven by natural selection. This situation is thus ambiguous.

2. If dN/dS < 1, purifying selection (selection against deleterious non-synonymous substitutions) has definitely operated. Some amino-acid substitutions may have been caused by selection, just not enough to overcome the effects of purifying selection. It is also known as stabilizing or negative selection.

3. If dN/dS > 1, selection has caused some amino-acid substitutions. Some substitutions may also have been caused by genetic drift. Purifying selection also likely operates, but is not strong enough to overcome the effects of positive selection10. A ratio greater than one implies positive selection, which means that, by natural selection, a new genotype is favored over another genotype.

9 Obtained from the website http://sites.biology.duke.edu/rausher/DNDS.pdf 10 Obtained from the website http://sites.biology.duke.edu/rausher/DNDS.pdf

25

1.3 Protein expression and its implication in diseases

As we have seen in this introduction to NDs, proteins are key to the development of both AD and PD. Proteins are the result of the transcription of a gene, a genomic region of DNA, into a strand of messenger RNA, and the later translation of this RNA into an amino acid strand. Proteins have a great deal of functions: hormonal, enzymatic, they participate in the action of contraction, transport and , in the immunity system as antibodies, among others.

1.3.1 Deoxyribonucleic acid

DNA stands for deoxyribonucleic acid. It is a nucleic acid, which is a biopolymer found basically in the nucleus of eukaryotic cells and in the cytoplasm of prokaryotic ones. It contains the genetic information of the organism, essential to its development and function. This information is transmitted to its descendants through reproduction.

DNA is a sequence of nucleotides, joined to one another in a chain of phosphodiester bonds. Each nucleotide is formed by a deoxyribose (which is a monosaccharide sugar), a nitrogen-containing nucleobase (which are adenine (A), guanine (G), thymine (T) and cytosine (C)), and a phosphate group. A and G are classified as purines, and C and T as pyrimidines. As the nitrogen-containing nucleobases are the only part of the nucleotides that change, these are the ones that are defined in the sequence of the chain. The basic structure of DNA consists of two complementary and antiparallel strands, so if one strand is marked by the 5’→3’ 11direction, the direction of the complementary strand is 3’→5’. They are called forward and reverse strands and the name given to one and the other is arbitrary. However, only one of both strands will be read when transcribing the sequence. The one transcribed into mRNA is called non-coding strand or template strand and the one which is not transcribed is called coding strand or non-sense strand. Therefore, the coding strand will have the same bases as the RNA transcript strand except thymine which will be replaced by uracil (a pyrimidine); that is why it is called coding, because the DNA’s codon12 corresponds to the same codon in the RNA strand (except for thymine). In some genes, its coding strand will correspond

11 This directionality means that the process of reading starts at carbon 5, which contains a phosphate group attached, and ends at carbon 3, where a new nucleotide would be attached if the strand was to be prolonged. 12 Codon: in DNA or RNA, a sequence of three nucleotides that codifies for a certain amino acid.

26 to the chromosome's forward strand, and in others it will correspond to the reverse strand. The following example may help to clarify the concept:

5'...ATGGCCTGC...3' coding strand 3'...TACCGGTCG...5' non-coding strand 5'...AUGGCCUGC...3' mRNA

Example 1: DNA transcription

DNA is always kept in the nucleus (of eukaryotic cells), so in order to express the information contained it has to replicate, transcript and translate.

1.3.2 Transcription of the DNA and formation of the mRNA strand

The process of transcription consists in converting a sequence of deoxyribonucleotides into a sequence of ribonucleotides (A, G, C and U). This is done through complementarity between nucleobases, which is given between A and U or T, and G and C. RNA polymerase’s function is to form the mRNA strand and it moves from 3’ to 5’. That is why the new sequence of RNA grows in a 5’→3’ direction.

Image 5: Representation of the process of transcription13.

13 Extracted from: http://www.nvo.com/jin/homepage21/

27

1.3.3 mRNA translation

Once the mRNA strand is formed, the process of translation begins. In the RNA strand, three ribonucleotides form a codon, which will codify for a determined amino acid according to the genetic code. Its correspondence is the same in all the living beings, and there are 64 possible codons that can be translated into 20 possible amino acids.

Image 6: Genetic Code (codons which stand for each amino acid). Notice that some codons are synonymous (despite being different, they codify for the same amino acid14).

The sequence of the amino acids of the chain determines which protein is formed and which function it is going to develop. Proteins have four levels of structure, which is also important to determine protein’s function; therefore, possible amino acid changes due to mutations may alter a protein’s structure and function.

1.3.4 Amino acids

Amino acids are organic molecules composed by an α-carbon, which contains one amino group (-NH2 ) and another carbon which contains a carboxylic acid (-COOH); they differ in some chemical aspects, represented by the R-group in the following image:

14 The genetic code is redundant in the way that multiple amino acids can be coded by more than one codon (synonymous codons).

28

Image 7: The chemical structure of an alpha amino acid in its un-ionized form15.

This R groups are hydrocarbonate chains which represent the part which is different in each amino acid. According to this, 20 amino acids can be distinguished, which are the following:

Image 8: The 20 amino acids16

15 Image extracted from https://en.wikipedia.org/wiki/Amino_acid#/media/File:AminoAcidball.svg 16 Image extracted from http://www.uic.edu/classes/bios/bios100/lectures/chemistry.htm

29

According to the nature of the R group, which confers to the amino acid specific chemical properties, these may be classified in four main groups:

Groups Amino acids

1. Non-polar and neutral Alanine, glycine, isoleucine, leucine, methionine, phenylalanine, proline and valine.

2. Polar and neutral Serine, threonine and tyrosine, asparagine, cysteine, glutamine and tryptophan.

3. Polar and acidic and glutamic acid. 4. Polar and basic Arginine, histidine and lysine.

Table 2: Classification of amino acids according to its R group.

The most important property of amino acids is that they have amphoteric character. That means that in an acid medium they behave as a base so they accept protons; they will be then positively charged. By contrast, in basic mediums, they act as an acid, so they donate protons (they will be negatively charged).

Below there is an example in order to make it easier in a visual way how amino acids are codified from the DNA sequence:

5'...ATGGCCTGC...3' coding strand 3'...TACCGGTCG...5' non-coding strand 5'...AUGGCCUGC...3' mRNA Met-Ala-Cys protein chain

Example 2: transcription and translation of the DNA strand.

30

1.3.5 Proteins involved in EOAD and EOPD

A mistake in the sequence of nucleotides can result in the mistranslation of a sequence of amino acids, which can lead to the malformation of a protein. Additionally, the malformations of proteins can result in the expression of some diseases, such as PD or AD.

The following tables show the main characteristics of proteins that will cause EOPD and EOAD, if they suffer determined missense and single-nucleotide mutations17 in the gene that codifies for them:

PROTEINS RELATED TO EARLY-ONSET-AD

Protein Characteristics and function

Amyloid-β (A4) precursor protein ● Structure: transmembrane protein. (APP); encoded by APP gene Length: 770 amino acids. ● Biological functions: 1. Synaptic formation and repair.

2. A type of neuronal transport. 3. Facilitation of iron export. 4. Hormonal regulation Presenilin 1 (PSEN1) ); encoded ● Structure: possesses a 9 by PSEN1 gene transmembrane topology Length: 467 amino acids. ● Biological functions: 1. Plays an important role in β-amyloid regulation. 2. Plays an important role in proteolytic process. Presenilin 2 (PSEN2) ); encoded ● Structure: single chain. by PSEN2 gene Length: 448 amino acids. ● Biological functions: 1. regulates APP. Table 3: Main characteristics of proteins involved in EOAD.

17 See the next section for more detailed information about mutations.

31

PROTEINS RELATED TO EARLY-ONSET-PD

Protein Characteristics and function

Parkin (ligase); encoded by ● Structure: length: 466 amino acids. PARK 2 gene ● Biological functions: 1. Homeostasis18 in the covalent attachment of to specific substrates. PTEN-induced putative kinase ● Structure: length: 581 amino acids 1 (PINK 1); encoded by PINK1 ● Biological functions: gene 1. Involved with mitochondrial quality control by identifying damaged mitochondria and targeting specific mitochondria for degradation Protein deglycase DJ-1 ● Structure: length: 189 amino acids (PARK7); encoded by DJ1 ● Biological functions: gene. 1. It protects neurons against oxidative stress and cell death Probable cation-transporting ● Structure: length: 1180 amino acids. ATPase 13A2 (PARK9); ● Biological functions: encoded by ATP13A2 gene 1. Plays a role in intracellular cation homeostasis and the maintenance of neuronal integrity.

F-box only protein 7 (FBXO7); ● Structure: length of 522 amino acid encoded by FBXO7 gene ● Biological functions: 1. Plays a downstream role on PINK1 in the clearance of damaged mitochondria. 85/88 kDa calcium- ● Structure: length of 806 amino acids independent phospholipase A2; ● Biological functions: encoded by PLA2G6 gene. 1. Catalyses the release of fatty acids from phospholipids. 2. Participates in fat mediated

18 The tendency of a system, especially the physiological system of higher animals, to maintain internal stability, owing to the coordinated response of its parts to any situation or stimulus that would tend to disturb its normal condition or function. Extracted from http://dictionary.reference.com/browse/homeostasis

32

apoptosis19 and in regulating transmembrane ion flux in B-cells. α-synuclein; encoded by SNCA ● Structure: length of 140 amino acids. gene ● Biological functions: 1. Involved in the regulation of release and transport. 2. Induces fibrillation of microtubule- associated protein tau. 3. Reduces neuronal responsiveness to various apoptotic stimuli. Leucine-rich repeat ● Structure: length of 2,527 amino acids serine/threonine-protein ● Biological functions: kinase 2; encoded by LRRK2 1. Plays a role in the retrograde gene trafficking pathway for recycling proteins. 2. Regulates neuronal process morphology in the intact central nervous System. 3. May play a role in the phosphorylation of proteins central to Parkinson disease. Vacuolar protein sorting- ● Structure: length of 796 amino acids associated protein 35; encoded ● Biological functions: by VPS35 gene 1. Intracellular protein transport 2. Negative regulation of late to transport 3. Retrograde transport, endosome to Golgi 4. Retrograde transport, endosome to plasma membrane 5. Transcytosis20

Table 4: Main characteristics of proteins involved in EOPD.

19 Apoptosis: cellular death. 20 Transcytosis is a mechanism for transcellular transport in which a cell encloses extracellular material in an invagination of the same membrane to form a vesicle, then moves the vesicle across the cell to eject the material through the opposite cell membrane by the reverse process. It is also called vesicular transport.

33

2- Working questions and hypotheses

The main objective of this project is to learn about and focus on small-scale mutations, SNPs, which may occur in genes related to EOAD and EOPD in beings and look for their appearance in the genomes of other animals.

Along this research, we have tried to answer some basic question about AD or PD, but after having gone deeper into our research and talked to some specialists, we formulated some new questions. The chronology of the questions and hypotheses which arose from the beginning is presented below:

1) The moment when we decided to carry out a research project about AD and PD was when we realised that people suffer great damage because of these two diseases and we did not know much about them. To sum up, we could say that the primary questions were: what are AD and PD? Can they present genetic causes?

2) Once we had read about AD and PD and had had a meeting with a specialist, Dr. Santpere, we were finally able to answer our first questions. We learnt that humans are affected by AD and PD not only when they age but also when they are young (EOND). Thanks to specialists Gerard Muntané and Gabriel Santpere, we also learnt that some single-nucleotide polymorphisms (SNP) in genes related to PD and AD cause the early-onset development of such diseases on human beings. Knowing this, Dr. Santpere guided us to formulate the next question, which was: if animals happened to present those same SNPs in their ancestral sequence, would they develop EOPD or EOAD as humans did? We also got to know that some animals are transgenically modified to develop AD or PD for experimental purposes, but can they develop EOND naturally? We then decided to accept Dr Santpere’s challenge21.

The hypotheses we then formulated to answer these questions before doing the research, were the following:

1) Our guess before beginning our research is that when it comes to late-onset ND, animals may develop it less than humans do because, as animal life in the

21 In fact, on the second round of our project, our work has been especially focused on finding an adequate response to the second question.

34

wild seems shorter22, they should be less prone to develop LOND. In addition, we must bear in mind that we are artificially enlarging our lives. Years ago, human beings used to live shorter because medicine and technology were not as developed as they are nowadays. LOND appears because of aging and degeneration of the brain and we are increasing our aging period so we die later than we are supposed to, thus, the probability that we suffer from these late onset diseases is greater than the probability animals may have.

2) Our instant thought was that animals can suffer EOND. However, in the case of AD, it is more difficult to appreciate its symptoms on animals, so we hypothesized that they may develop it without us humans being able to realise it.

3) We did not formulate a formal hypothesis or guess to answer the question if would develop EOAD or EOPD if they presented the same codon as the human mutation. We did not have any previous knowledge in this field of expertise, so both answers, positive or negative, seemed quite plausible to us. The question, thus, remained open.

4) By doing the research we realised that SNPs that cause EOAD or EOPD are suffered by few people, so our hypothesis was that they are negatively selected; in other words, the new genotype and phenotype that they cause is not favoured over the healthy phenotype and genotype, which means that, at the moment, natural selection does not favour mutations which cause EOAD and EOPD.

5) Finally, we expected that these SNP would result in the codification of an amino acid with very different characteristics than the ancestral one, thus provoking great damage to the organism.

22 (It is believed that body mass and brain mass have some influence on it. Those who have more mass live longer than those with a smaller mass.)

35

3- Methodology

In order to achieve the goals of this research project (i.e., to be able to give answers to our “working questions”), we had to immerse ourselves into the world of genomics, learn how it works and try to get the right approach to genomics by searching, arranging and using the available public data on genomes, which may be consulted in several databases.

However, the reader should bear in mind that we were entering a totally unknown field for us. In order not to do so just by guessing, we needed someone to guide us through the project. That is why on December 12th 2014 we arranged a meeting with Dr. Gabriel Santpere Baró in PRBB where at that moment he worked at the IBE-CSIC and UPF as a researcher in evolutionary biology and who had obtained his PhD in experimental biology on AD.

There, he told us what the different steps of the project should be and the different tools we would need. Firstly, he introduced us to the basic theoretical aspects we needed to know, most of which are explained in our project; to continue with, he told us how to start and gave us hints on what would follow. Due to our lack of expertise in this field, we had to watch a lot of tutorials in order to learn how to manage all the databases available and surf through the websites needed specialised in this subject.

What we had to do first was to get a list of the genes that directly cause the early onset PD or AD. In order to know which they were, we first made a bibliographic search to find out which proteins are involved in these diseases and how they act in AD or PD; after establishing the proteins, we consulted OMIM’s (www.omim.org) database in order to know which of them were involved in EOAD. OMIM stands for Online Mendelian Inheritance in Men, and it is a database of Mendelian genes and diseases. We found out that out of 4 genes that are known to be involved in AD, (such as APOE, APP, PSEN1 and PSEN2, only 3 were involved in EOAD (of these 3 some of the allelic variants contained mutations which caused the disease). The genes involved in EOAD are PSEN1, PSEN2 and APP.

Then, once we had accessed the OMIM website, we searched for each of the genes that directly caused the early onset disease. We then looked for its allelic variants, to see all the mutations caused by this gene. We also used Alzforum (www.alzforum.org) to help us find as much information as possible about each

36 mutation and, with all the gathered information, we built a worksheet, one for each gene with all the mutations we were interested in. So, by working with these two databases we managed to make a catalogue of Mendelian mutations. Figure 1 shows the steps we followed to do so.

On the other hand, the genes involved in PD are object of current research; therefore, after we had found all the genes and mutations involved in EOAD, and scarce and contradictory information about the mendelian mutations involved in PD, Dr. Santpere kindly provided us with the current and valid list of genes involved in PD, which are PARK2, ATP13A2, DJ1, FBX07, LRRK2, VPS35, PINK1, PLA2G6 and SNCA. Of these genes, we also had to find out all the allelic variants that contain the mutations which cause the disease, and we repeated the procedure followed in AD. In order to make it more visual and easy to understand, the procedure is explained step by step with pictures in between of the paragraphs, as it was hard for us to know how to surf through OMIM. (See image below).

In order to build the table, first we went to OMIM webpage, just looking for it in Google, although you can also use the direct link www.omim.org. Once in the webpage, we typed the name of the gene we wanted to look for and we pressed search.

37

To continue with, we clicked on the first option, the one that gives you information about the gene you are interested in, in this case PSEN1. Then, on the left side of the screen we can see Allelic variants, and we clicked there in order to see all the different mutations from each gene. Once there, we just had to take the useful information we could find in order to build the tables, which are shown below.

38

Figure 1: Example of the steps followed in order to obtain the necessary information about all the selected genes which we needed to build our basic worksheets23.

23 All the images from the methodology are screenshots we have taken.

39

OMIM is a really thorough website, as it gives a great deal of information. However, we realized that www.alzforum.org directly gave us the information we needed, but just about the mutations from the genes APP, PSEN1 and PSEN2, the ones typical from AD. So just by searching for the name of the mutation obtained by OMIM, all the information relative to the specific mutation appeared.

Figure 2: Example of a worksheet, with the gathered information about one of the genes, after having compiled of all the information obtained by checking OMIM and Alzforum.

Unfortunately, Dr. Santpere had to move out of the country for work purposes, so he could not answer to our questions or guide us anymore; therefore, Dr. Gerard Muntané Medina kindly offered to replace him. That’s why, on July 20th 2015 we returned to the PRBB, where Dr. Muntané24 also works, to decide which steps we had to take to continue our project. We had completed the first part of the tables, some information, such as the coordinates, damaging and frequency of the mutation was still missing.

24 For more information about him go to: http://www.researchgate.net/profile/Gerard_Muntane

40

Once we had gathered as much information as possible about each mutation from OMIM, we had to use other websites in order to find the coordinates of the mutation, its damaging (both in SIFT and Polyphen databases), its frequency and a sequence of 21 nucleotides. The reason why we took 21 nucleotides is that the mutations we are looking at are single changes of nucleotides (SNP: single nucleotide polymorphism). Therefore, this may lead to a change in the amino acid, which is formed by three nucleotides. In addition, we needed a sequence of the mutation to make an alignment of the humans sequence with other animals’, and the mutation was always in the middle of the sequence, usually in the fourth codon. So having this small sequence helped us in those two ways. With 21 nucleotides we had seven full codons, and the one affected by the mutation was left in the middle. In addition, 21 is enough to look for the mutation in a specific region of the gene.

With the SNP’s code, we were able to find the coordinates of the mutation at Ensembl (www.ensembl.org). On this same web page we have clicked on “See all predicted consequences”.

41

Figure 3: Steps followed in order to find important information about the mutation, such as the damaging (blue square ), the nucleotide change in the codon (orange square ), the amino acid change (green square ) and its position in the protein (black circle ).

42

Once we had clicked there, plenty of information appeared. You can see the damaging (with a blue square around it) as well as the codon change in the DNA coding strand (with an orange square around it) and the amino acid change (with a green square around it), which were useful to ensure that the information on our worksheets was correct. In order to know which mutation you are studying, you must take a look at the additional information, such as the position in the protein (surrounded by a black circle) or the position in CDS25.

25 The CDS represents the nucleotide position in the CoDing Sequence (CDS).

43

Figure 4: This table shows important information about the mutation, such as the damaging (blue square ), the nucleotide change in the codon (orange square ), the amino acid change (green square ) and its position in the protein (black circle ).

44

There were times when we were not able to know to which case our mutation was referring to, as the damaging was different in all of them. In this cases, where we were not able to identify our mutation between all the options and each of them showed a different damaging, we decided to include them all in the tables, hoping to be able to sort this out later. That is why in some allelic variants more than one damaging is shown.

To find the sequence of 21 nucleotides, only the coordinates of the mutation are needed. On the website UCSC (genome.ucsc.edu), the sequence of most parts of the gene and the amino acids which it codifies could be found simply by entering the coordinates. However, in some cases the coding strand is shown and in others what is shown is the template strand. That is why some tables containing the information of the mutations of the genes have the non-coding strand sequence (the DNA codon is antiparallel and complementary to the RNA codon)26 and others the coding strand sequence (the DNA codon is the same as the RNA codon but instead of Thymine we find Uracil)27. The results are all unifiedin the coding strand sequence.

26 These are APP, PSEN1, PSEN2, ATP13A2, PARK2, PLA2G6, SNCA and VPS35. 27 These are DJ1, FBXO7, LRRK2 and PINK1.

45

46

Figure 5: Steps followed in order to obtain the sequence of the genomic region of the mutation through the UCSC website. As we can see in the DNA coding strand sequence, ATG is complementary to RNA AUG, which codifies for Met.

47

Figure 6: Final columns of the PSEN1 Excel table, after having completed them added the

48

information from UCSC, and other key information from Ensembl such as the damaging, among others.

Once we had obtained the basic information of each mutation, we wanted to see if there are animals that do have these mutations in the corresponding genomic region. In order to do so, we designed sequence alignments of the corresponding human gene as well as the genes of ten other animals, listed in the following table:

COMMON SCIENTIFIC NAME PICTURE NAME Chimpanzee Pan troglodytes

Gorilla Gorilla gorilla

Orangutan Pongo abelii

49

Olive baboon Papio anubis

Macaque Macaca mulatta

Vervet-AGM Chlorocebus sabaeus

Marmoset Callithrix jacchus

Horse Equus caballus

50

Pig Sus scrofa

Mouse Mus musculus

Figure 7: Names and pictures of the animals we chose and of whom we designed the gene alignments.

There are some reasons for having chosen these animals. First of all, not all the information about every animal can be found. Primates are the ones that have been studied more in depth, as well as the pig and the mouse. Therefore, all of our animals are mammals and most of them are primates, as we needed as much information as possible and we were interested in keeping the shortest distance possible when it comes to evolution. Additionally, primates were of high interest to us in order to see the evolution process, as our common ancestor to them is closer. However, we could not work on all of them. The bonobo, for example, is missing, as we could not find its gene sequence.

To make the alignment, we started, as usual, accessing the Ensembl webpage (www.ensembl.org) and looking for the gene we wanted to make the alignment of.

51

Once there and having clicked, in this example, PSEN1 (Human Gene), on the left we could see Genomic Alignments, so we clicked there. This action lead us to a page where we could select or unselect which animals we wanted from “Configure this page”, as well as our display options:

52

53

Once our page had been configured, our alignment was ready to appear on screen. But first we realised that there were different blocks, each of them containing different regions of the gene. Therefore, we had to look at the coordinates of our mutations and see which block we could find them in. We could also download the gene, but this did not prove to be of any use for the purpose of our research. In the example, the first mutation in the table, M146L, was found in coordinates 14:73173663, so we found it in block 1:

e

54

Once we had clicked on “Block 1”, the alignment was shown on the screen, as well as a view of the location in the chromosome and a phylogenic tree:

Figure 8: Steps followed in order to obtain a view of the chromosome with the exact location of the gene, the phylogenic tree and, finally, the alignment of all the selected species sequences.

55

After obtaining the alignment and knowing our exact coordinates, we had to look for them. However, only the coordinates of the nucleotides of the beginning and end of each line were shown. Therefore, we had to count until we found our exact nucleotide.

In the previous image we can see an example of the alignment. The black arrow marks the affected nucleotide, in this case, Thymine. The blue square shows the codon that may contain the mutation. Finally, the green square shows the sequence of 21 nucleotides we have included in our tables, which is composed of seven codons. However, this sequence may not be the same as the one we find in the tables with all the information about each mutation, as Ensembl gives us the sequence of coding DNA, but sometimes the one UCSC gives us is the template strand. That is why we always checked the information we had gathered to know exactly what we were looking at.

Finally, after having gathered all the sequences of each mutation, we could start to design our own comparative tables, retrieving some information from the previous tables:

56

Figure 9: example of the genomic alignment of the first two mutations in PSEN1. The red codon is the one affected by the mutation, with the specific nucleotide in bold. In the nucleotide column we find the codons that are different from the human ancestral in red, and in the amino acid ones we find the amino acid that codifies for the codon in green if it’s synonymous, and in red if the amino acid which codifies for the codon is either the one from the mutation or a different one.

Another aspect we wanted to take a look at is the ratio of not synonym changes to the synonym ones in our portions of the genome (dN/dS), comparing them to whole-genome dN/dS ratios. In order to do so, we accessed Ensembl (www.ensembl.org) and we looked for the gene, in this example PSEN1. We then clicked on the link that directed us to “PSEN1 (Human Gene)”. Once there, on the left column we could see a section called “Comparative Genomics”, and there was an option that said “Orthologues28”, so we clicked there, from where we were able to look at the dN/dS.

28 Orthologue genes are considered homologous gene sequences found in different species (they come from a common ancestral gene).

57

58

Once here, we could select our species by clicking on “Configure this page”, so we selected the following: - Chimpanzee - Gorilla - Horse - Human - Macaque - Marmoset - Mouse - Olive baboon - Orangutan - Pig - Vervet-AGM

Once the species were selected, we pressed click and, by scrolling down, we found what we were looking for, namely, the dN/dS.

59

Figure 10: Steps followed in order to look at the dN/dS of each gene.

So gathering all these results from all the genes, we designed a table:

Figure 11: Table resulting from the gathered information regarding dN/dS..

60

4- Results and discussion

In order to be able to analyze our results in a more easy and visual way, we designed summary tables29 from each gene containing the amino acid change of our mutations, its predicted damaging according to the amino acid properties and its geographic origin. We then filled two blank maps30, one for each disease, to see which countries there had been individuals with mutations in a specific gene that caused AD or PD. Each gene is identified with a different colour. With these maps, our goal is to see which geographic area the mutations in each gene come from. On this map we located the individuals genotyped with an SNP mutation.

On the other hand, we also created another table for each gene reflecting the results of the animals’ alignments, paying attention to those whose ancestral sequence codified for a different amino acid. With these, we tried to predict whether the change was major or not, based on the differences between amino acid from different groups. Also, with the results of these last tables, we were able to look at the rankings of which animals present more sequence changes, in order to see if they matched our predictions, meaning that those that are phylogenetically further from us should be the ones with more differences in comparison to us, humans.

Finally, we would like to point out that during the project a great deal of questions have arisen that we could not find answers to due to our lack of expertise in this field. In addition, there are numerous unsolved questions that arose throughout the project, when looking for references about the subject, especially regarding animals suffering from AD or PD. That is why we interviewed some experts hoping that they could help us to find an answer to them.

To sum up, we have found and confirmed that the proteins involved in EOAD are APP, PSEN1 and PSEN2, and the ones involved in EOPD are ATP13A2, DJ1, FBXO7, LRRK2, PARK2, PINK1, PLA2G6, SNCA and VPS35. Some SNP mutations in these genes that codify for each protein (which carry the name of the protein) can lead to EOAD or EOPD.

29 All of the result tables can be found in the digital section of annexes. 30 The maps are found at the end of the project, in annexes.

61

4.1 Mutations leading to EOAD and comparison with animals’ genomes

Among all the differences the animals present in the codon affected by the mutation leading to EOAD in humans, we found thirty-eight changes, all of which showed a synonymous translation of the amino acid.

4.1.1 APP gene

We have found that there are 15 SNP mutations in the human APP gene that can cause early onset familiar AD. These mutations are:

AMINO DAMAGING

MUTATION ACID (amino acid GEOGRAPHIC ORIGIN CHANGE properties)31

E693Q Glu to Gln 3 to 2 Dutch

V717I Val to Ile 1 to 1 Japanese and Canadian

V717F Val to Phe 1 to 1 Indiana

V717G Val to Gly 1 to 2 Unknown

V717L Val to Leu 1 to 1 American, Caucasian family of English ancestry.

A692G Ala to Gly 1 to 2 Dutch

K670N Lys to Asn 4 to 2 Unknown

31 As we found the damaging data found in Polyphen and SIFT difficult to understand and interpret, we have interpreted the damaging in our own way. As amino acids may share some characteristics and belong to some group, we have decided to express the damaging as the change we could observe in this more general properties. This criteria will be applied in all the further tables. The main four groups according to amino acid properties are: 1- Non-polar 2- Polar 3- Acidic 4- Basic

62

M671L Met to Leu 1 to 1 Unknown

A713T Ala to Thr 1 to 2 French

I716V Ile to Val 1 to 1 Florida

V715M Val to Met 1 to 1 French, Italian and Korean

E693G Glu to Gly 3 to 2 Sweden

T714I Thr to Ile 2 to 1 Austrian, American of African descent

T714A Thr to Ala 2 to 1 Iranian

A673V Ala to Val 1 to 1 Italian

Table 5: summary of the mutations with their amino acid change and damaging, according to their properties, and the geographic origin of each mutation of EOAD.

Out of these fifteen, none has been found as ancestral sequence in any of the ten animals we have chosen to make the alignments (without counting humans). However, there are some changes in the ancestral sequences in the position of the mutation that result in a synonymous change in the amino acid. We have found those changes in the position of the following mutations in the following animals:

CODON (DNA) ANIMALS AMINO ACID AMINO ACID A692G GROUP

HUMAN TCC Gly 2 MUTATION HUMAN TGC Ala 1

OLIVE-BABOON GGC Ala 1

VERVET-AGM GGC Ala 1

MACAQUE GGC Ala 1

Table 6: Summary of the animals that present a different codon from humans in the codon affected by the SNP mutation in humans and its impact in the amino acid translation.

63

In this case, glycine (Gly) is a polar amino acid and alanine (Ala) is a non-polar amino acid. This major difference in their structure may cause a disarrangement in the protein that leads to AD. These three species (olive-baboon, vervet-AGM, macaque) have a different codon but it does not affect the structure of the protein.

ANIMALS CODON (DNA) AMINO ACID AMINO ACID A713T GROUP HUMAN CGT Thr 2 MUTATION HUMAN CGC Ala 1 HORSE TGC Ala 1 PIG TGC Ala 1 OLIVE BABOON TGC Ala 1 MARMOSET TGC Ala 1 VERVET-AGM TGC Ala 1 MACAQUE TGC Ala 1 Table 7: Summary of the animals that present a different codon from humans in the codon affected by the SNP mutation in humans and its impact in the amino acid translation.

Alanine (Ala) is a non-polar amino acid, whether threonine (Thr) is a polar amino acid. This major difference in the structure of the amino acid may cause a disarrangement in the protein APP that leads to AD. However, there are 6 species that present a different nucleotide but it does not cause EOAD because the resulting amino acid is the same32.

ANIMALS CODON (DNA) AMINO ACID AMINO ACID I716V GROUP HUMAN GAC Val 1 MUTATION HUMAN GAT Ile 1 HORSE AAT Ile 1 PIG AAT Ile 1 MARMOSET AAT Ile 1 Table 8: Summary of the animals that present a different codon from humans in the codon affected by the SNP mutation in humans and its impact in the amino acid translation.

32 This is known as the redundance of the genetic code.

64

Although isoleucine (Leu) and valine (Val) are both of the non-polar amino acid group, the differences between them must cause a major change in the structure of the protein, as this mutation causes AD in humans. Horses, pigs and marmosets have a different codon but that results in the same amino acid as humans, who do not have the mutation. Therefore, the protein is not affected by it.

ANIMALS CODON (DNA) AMINO ACID AMINO ACID T714A/T714I GROUP HUMAN TAT Ile 1 MUTATION HUMAN TGT Thr 2 PIG GGT Thr 2 Table 9: Summary of the animals that present a different codon from humans in the codon affected by the SNP mutation in humans and its impact in the amino acid translation.

Isoleucine (Ile) is a non-polar amino acid and threonine (Thr) is a polar aminoacid, this big difference between the amino acids may cause a change in the structure of APP, leading to AD. On the other hand, pigs have a different codon than humans but it translates into the same amino acid. Therefore, the protein is not affected by this change.

ANIMALS CODON (DNA) AMINO ACID AMINO ACID A673V GROUP HUMAN TAC Val 1 MUTATION HUMAN TGC Ala 1 HORSE CGC Ala 1 Table 10: Summary of the animals that present a different codon from humans in the codon affected by the SNP mutation in humans and its impact in the amino acid translation.

Both valine (Val) and alanine (Ala) are non-polar amino acids and their structure is similar, and thus their chemical properties should be as well, but Val is bigger than Ala. This fact may cause a sufficiently major change in the structure of APP that leads to AD. On the other hand, horses have a different codon but which translates into the same amino acid. Therefore it does not affect the structure of the protein.

In conclusion, all the animals’ codons studied from APP that differ from the respective ones humans have result in a synonymous translation of the amino acid.

65

Therefore, the protein’s structure and function is not altered in any case by these codons.

4.1.2 PSEN1 gene

We have found that there are 34 SNP mutations in PSEN1 that cause early onset AD. These mutations are:

DAMAGING AMINO ACID MUTATION (amino acid GEOGRAPHIC CHANGE properties) ORIGIN

M146L Met to Leu 1 to 1 Italian

M146L Met to Leu 1 to 1 Unknown A246E Ala to Glu 1 to 3 Unknown

H163R His to Arg 4 to 4 American

L286V Leu to Val 1 to 1 Swedish

C410Y Cys to Tyr 2 to 2 Unknown

M139V Met to Val 1 to 1 German, British, African American

M146L Met to Leu 1 to 1 Italian and Greek

H163Y His to Tyr 4 to 2 Swedish

E280A Glu to Ala 3 to 1 Colombian

E280G Glu to Gly 3 to 2 Unknown P267S Pro to Ser 1 to 2 Unknown

R278T Arg to Thr 4 to 2 Finish and Australian

E120D Glu to Asp 3 to 3 Israeli

A426P Ala to Pro 1 to 1 Scottish-Irish

66

M146I Met to Ile 1 to 1 Swedish

M146I Met to Ile 1 to 1 Unknown L250S Leu to Ser 1 to 2 Unknown C92S Cys to Ser 2 to 2 Unknown

G206A Gly to Ala 2 to 1 Caribbean Hispanic

G266S Gly to Ser 2 to 2 Unknown L113P Leu to Pro 1 to 1 Unknown

L166P Leu to Pro 1 to 1 Swedish

L174M Leu to Met 1 to 1 Cuban (Spanish)

L271V Leu to Val 1 to 1 Unknown G183V Gly to Val 2 to 1 Unknown P436Q Pro to Gln 1 to 2 Unknown R278I Arg to Ile 4 to 1 Unknown L85P Leu to Pro 1 to 1 Unknown

Guadalajara,

A431E Ala to Glu 1 to 3 southern California, Chicago and Mexico

D333G Asp to Gly 3 to 2 African American

A79V Ala to Val 1 to 1 Unknown S170F Ser to Phe 2 to 1 Unknown

G217R Gly to Arg 2 to 4 Irish / English

Table 11: summary of the mutations with their amino acid change and damaging, according to their properties, and the geographic origin of each mutation of EOAD.

Out of these thirty-four, none has been found as ancestral sequence in any of the ten animals we have chosen to make the alignments (without counting humans). However, there are some changes in the ancestral sequences in the position of the mutation that result in a synonymous change in the amino acid. We have found those changes in the position of the following mutations in the following animals:

67

ANIMALS CODON (DNA) AMINO ACID AMINO ACID A246E GROUP HUMAN GAG Glu 1 MUTATION HUMAN GCG Ala 3 MARMOSET GCA Ala 3 MOUSE GCA Ala 3 Table 12: Summary of the animals that present a different codon from humans in the codon affected by the SNP mutation in humans and its impact in the amino acid translation.

Alanine (Ala) is a non-polar amino acid, whether glutamic acid (Glu) is an acidic one. This major difference in their structure and chemical properties may be the cause of a major disarrangement in the protein, leading to AD. On the other hand we can see that although marmosets and mice have a different nucleotide in this codon, the difference does not affect the protein, as it is translated into the same amino acid, in this case, alanine.

ANIMALS CODON (DNA) AMINO ACID AMINO ACID L286V GROUP HUMAN GTC Val 1 MUTATION HUMAN CTC Leu 1 MARMOSET CTT Leu 1 MOUSE CTT Leu 1 PIG CTT Leu 1 HORSE CTT Leu 1 Table 13: Summary of the animals that present a different codon from humans in the codon affected by the SNP mutation in humans and its impact in the amino acid translation.

Both valine (Val) and leucine (Leu) are non-polar amino acids and its structure seems really similar, although leucine is a bit bigger, as it presents two CH2 that valine does not. However, a single nucleotide change in this specific site translates into valine instead of leucine, which leads to AD. And this contrasts with our prediction that the amino acid change would be really different from the original one. Although marmosets, mice, pigs and horses have a different codon, it does not affect the structure of the protein, as the change is synonymous.

68

ANIMALS CODON (DNA) AMINO ACID AMINO ACID C410Y GROUP HUMAN TAT Tyr 2 MUTATION HUMAN TGT Cys 2 MOUSE TGC Cys 2 HORSE TGC Cys 2 Table 14: Summary of the animals that present a different codon from humans in the codon affected by the SNP mutation in humans and its impact in the amino acid translation.

As in the previous mutation both amino acids, tyrosine (Tyr) and cysteine (Cys), are from the same group, in this case they are polar, but their structure is different, as tyrosine presents a phenol and cysteine a SH- instead. These differences may be the ones responsible for causing the change which leads to the disease. The different codon found in mice and horses is translated into the same amino acid, in this case cysteine. Therefore, the protein is not affected by it.

ANIMALS CODON (DNA) AMINO ACID AMINO ACID C92S GROUP HUMAN TCC Ser 2 MUTATION HUMAN TGC Cys 2 PIG TGT Cys 2 HORSE TGT Cys 2 Table 15: Summary of the animals that present a different codon from humans in the codon affected by the SNP mutation in humans and its impact in the amino acid translation.

Serine (Ser) and cysteine (Cys) are both from the polar amino acid group, and their structure is pretty similar, they only differ by an OH- in serine instead of an SH- found in cysteine. We see, again, that small changes lead to a serious disease. When it comes to the alignments, the codons in pigs and horses are synonymous as well, so the protein suffers no changes because of them.

69

ANIMALS CODON (DNA) AMINO ACID AMINO ACID G266S GROUP HUMAN AGT Ser 2 MUTATION HUMAN GGT Gly 2 MOUSE GGC Gly 2 PIG GGC Gly 2 Table 16: Summary of the animals that present a different codon from humans in the codon affected by the SNP mutation in humans and its impact in the amino acid translation.

In this mutation as well, both amino acids, serine (Ser) and glycine (Gly), are polar. Their difference in structure stands in the fact that whether serine presents an OH-, glycine has an H-, and this only change may lead to a non-correct protein which causes the disease. Regarding the alignments we see that, once more, the codon’s translation is synonymous so the protein is not affected for mice and pigs in this case.

ANIMALS CODON (DNA) AMINO ACID AMINO ACID G217R GROUP HUMAN CGT Arg 4 MUTATION HUMAN GGT Gly 2 CHIMPANZEE GGA Gly 2 MOUSE GGC Gly 2 PIG GGC Gly 2 Table 17: Summary of the animals that present a different codon from humans in the codon affected by the SNP mutation in humans and its impact in the amino acid translation.

Arginine (Arg) is a basic amino acid, whether glycine (Gly) is a polar one. Therefore, their structure presents a lot of differences that may lead to changes in the protein’s structure and, in turn, to AD. The differences in the DNA codons from , mice and pigs do not alter the protein, as they are synonymous. It is surprising, however, that chimpanzees present a change, as they are genetically the closest ones to us.

70

ANIMALS CODON (DNA) AMINO ACID AMINO ACID H163Y GROUP HUMAN TAT Tyr 2 MUTATION HUMAN CAT His 4 MOUSE CAC His 4 Table 18: Summary of the animals that present a different codon from humans in the codon affected by the SNP mutation in humans and its impact in the amino acid translation.

Histidine (His) is a basic amino acid which when it replaces the tyrosine (Tyr), a polar amino acid, in this exact location, leads to the development of AD, probably because the difference in structures affects the protein PSEN1. Mice, although having a single nucleotide difference in the codon, are not affected by any alterations in the protein, as the change found in their sequence is synonymous.

ANIMALS CODON (DNA) AMINO ACID AMINO ACID E280A GROUP HUMAN GCA Ala 1 MUTATION HUMAN GAA Glu 3 MOUSE GAG Glu 3 Table 19: Summary of the animals that present a different codon from humans in the codon affected by the SNP mutation in humans and its impact in the amino acid translation.

Alanine (Ala) is a non-polar amino acid, and glutamic acid (Glu) is an acid amino acid; this single nucleotide polymorphism leads to a different structure in the protein and, in turn, to AD. Mice, as in previous mutations, present a different nucleotide in the codon but it results synonymous, so the protein is not altered.

ANIMALS CODON (DNA) AMINO ACID AMINO ACID E280G GROUP HUMAN GGA Gly 2 MUTATION HUMAN GAA Glu 3 MOUSE GAG Glu 3 Table 20: Summary of the animals that present a different codon from humans in the codon affected by the SNP mutation in humans and its impact in the amino acid translation.

71

In this mutation, an acid amino acid, glutamic acid (Glu), is exchanged by a polar one, glycine (Gly), which leads to AD, and this is caused by an SNP. Mice also present a different nucleotide in the codon but it translates to the same amino acid.

ANIMALS CODON (DNA) AMINO ACID AMINO ACID P436Q GROUP HUMAN CAA Gln 2 MUTATION HUMAN CCA Pro 1 MOUSE CCC Pro 1 Table 21: Summary of the animals that present a different codon from humans in the codon affected by the SNP mutation in humans and its impact in the amino acid translation.

In this case, proline (Pro), a non-polar amino acid, is replaced by glutamine (Gln), a polar one, and this replacement results in the development of AD. As in previous mutations, mice present a synonymous codon with one different nucleotide that does not alter the protein.

ANIMALS CODON (DNA) AMINO ACID AMINO ACID A431E GROUP HUMAN GAA Glu 3 MUTATION HUMAN GCA Ala 1 MOUSE GCG Ala 1 Table 22: Summary of the animals that present a different codon from humans in the codon affected by the SNP mutation in humans and its impact in the amino acid translation.

With only a SNP, alanine (Ala), non-polar, is replaced by glutamic acid (Glu), which leads to AD. Mice present as well a different nucleotide in the codon that results synonymous in the translation of the amino acid, so it does not affect the structure of the protein.

72

ANIMALS CODON (DNA) AMINO ACID AMINO ACID G206A GROUP HUMAN GCT Ala 1 MUTATION HUMAN GGT Gly 2 HORSE GGC Gly 2 Table 23: Summary of the animals that present a different codon from humans in the codon affected by the SNP mutation in humans and its impact in the amino acid translation.

Glycine (Gly) and alanine (Ala) correspond to different amino acid groups, which means that their properties are quite different. The replacement of glycine by alanine alters the structure of the protein, causing AD. Horses present a different nucleotide in the codon affected by the mutation that results in a synonymous change. Therefore, it has no consequences on the protein structure and function.

ANIMALS CODON (DNA) AMINO ACID AMINO ACID S170F GROUP HUMAN TTT Phe 1 MUTATION HUMAN TCT Ser 2 PIG TCC Ser 2 Table 24: Summary of the animals that present a different codon from humans in the codon affected by the SNP mutation in humans and its impact in the amino acid translation.

As previously mentioned, amino acids from different groups present certain characteristics that may alter the structure of the protein. This is the case of this mutation, whose alteration from serine (Ser) to phenylalanine (Phe) leads to AD. Pigs present a different codon than humans. However, this one results synonymous in the translation of the amino acid, so the protein is not altered.

In conclusion, all the animals’ codons studied from PSEN1 that differ from the respective ones humans have result in a synonymous translation of the amino acid. Therefore, the protein’s structure and function is not altered in any case by these codons.

73

4.1.3 PSEN2 gene

We have found that there are 9 SNP mutations that can cause early onset familiar AD in PSEN2. These mutations are:

MUTATION AMINO ACID DAMAGING GEOGRAPHIC CHANGE (amino acid ORIGIN properties)

N141I Asn to Ile 2 to 1 German

M239V Met to Val 1 to 1 Italian

D439A Asp to Ala 3 to 1 Unknown

T430M Thr to Met 2 to 1 Unknown

T122P Thr to Pro 2 to 1 Unknown

M239I Met to Ile 2 to 1 Unknown

T122R Thr to Arg 2 to 4 Unknown

S130L Ser to Leu 2 to 1 Unknown

A85V Ala to Val 1 to 1 Sardinian

Table 25: summary of the mutations with their amino acid change and damaging, according to their properties, and the geographic origin of each mutation of EOAD.

Out of these nine, none has been found as ancestral sequence in any of the ten animals we have chosen to make the alignments (without counting humans). However, there are some changes in the ancestral sequences in the position of the mutation that result in a synonymous change in the amino acid. We have found those changes in the position of the following mutations in the following animals:

ANIMALS CODON (DNA) AMINO ACID AMINO ACID T430M GROUP

HUMAN ATG Met 1 MUTATION

HUMAN ACG Thr 2 MOUSE ACA Thr 2

Table 26: Summary of the animals that present a different codon from humans in the codon affected by the SNP mutation in humans and its impact in the amino acid translation.

74

While threonine (Thr) is a polar amino acid, methionine (Met) is non-polar. When human suffer the T430M SNP they develop PD. When we watch at the genomic alignment we see that mice present a different codon than humans in the same region. However, the resulting amino acid is the same (Thr).

ANIMALS CODON (DNA) AMINO ACID AMINO ACID S130L GROUP HUMAN TTG Leu 1 MUTATION

HUMAN TCG Ser 2 PIG TCC Ser 2 Table 27: Summary of the animals that present a different codon from humans in the codon affected by the SNP mutation in humans and its impact in the amino acid translation.

Serine (Ser) is a polar amino acids and leucine (Leu) a non-polar one. When humans suffer the S130L SNP they develop AD. When watching at the genomic alignment we see that pigs present a different codon in the same region. However, the resulting amino acid is the same (Ser), not affecting the protein.

To sum up, all the animals’ codons studied from PSEN2 that differ from the respective ones humans have result in a synonymous translation of the amino acid. Therefore, the protein’s structure and function is not altered in any case by these codons.

4.2 Mutations leading to EOPD and comparison with animals’ genome

In the codons that lead to EOPD in humans if altered, we found forty-eight changes among all the animals chosen. Out of all these, forty-two result in a synonymous change, and the other six to a non-synonymous. Of these six, three correspond to the change of the mutation that leads to PD in humans, but the other three translate into a different amino acid.

4.2.1 DJ1 gene

We have found that there are 6 SNP mutations in DJ1 that cause early onset familiar PD. These mutations are:

75

MUTATION AMINO ACID DAMAGING GEOGRAPHIC CHANGE (amino acid ORIGIN properties) Leu to Pro 1 to 1 L166P Italian Met to Ile 1 to 1 M26I Ashkenazi-Jewish Asp to Ala 3 to 1 D149A Afro-Caribbean E64D Glu to Asp 3 to 3 Unknown Glu to Lys 3 to 4 E163K Southern Italy A39S Ala to Ser 1 to 2 Unknown Table 28: summary of the mutations with their amino acid change and damaging, according to their properties, and the geographic origin of each mutation of EOPD.

Out of these six, none has been found as ancestral sequence in any of the ten animals we have chosen to make the alignments (without counting humans). However, there are some changes in the ancestral sequences in the position of the mutation. Some of them result in a synonymous change in the amino acid, but others do not. We have found those changes in the position of the following mutations in the following animals:

ANIMALS CODON (DNA) AMINO ACID AMINO ACID L166P GROUP HUMAN CCT Pro 1 MUTATION HUMAN CTT Leu 1 MOUSE CTA Leu 1 HORSE CTG Leu 1 Table 29: Summary of the animals that present a different codon from humans in the codon affected by the SNP mutation in humans and its impact in the amino acid translation.

Although proline (Pro) and leucine (Leu) are both non-polar amino acids, the differences between them must cause a change in the structure of DJ1. On the other hand, mice and horses have a different codon but that translates into the same amino acid, so it does not affect the structure of the protein.

76

ANIMALS CODON (DNA) AMINO ACID AMINO ACID E64D GROUP HUMAN GAC Asp 3 MUTATION HUMAN GAG Glu 3 MOUSE CAG Gln 2 HORSE CAG Gln 2 Table 30: Summary of the animals that present a different codon from humans in the codon affected by the SNP mutation in humans and its impact in the amino acid translation.

In this case, aspartic acid (Asp) and glutamic acid (Glu) are both acidic amino acids, so their structure is similar but Glu is bigger, as it presents one more CH2 than aspartic acid. This difference must cause a disarragement in the protein DJ1 that leads to EOPD. Mice and horses present a different codon that translates into glutamine (Gln), a polar amino acid. Having found ourselves in this situation, we had to discuss with an expert the possible consequences of this change.

4.2.2 PLA2G6 gene

We have found that there are 13 SNP mutations in PLA2G6 that cause early onset PD. These mutations are:

MUTATION AMINO ACID DAMAGING GEOGRAPHIC CHANGE (amino acid ORIGIN properties) Y790X Tyr to X 2 to X Unknown K545T Lys to Thr 4 to 2 Pakistani V310E Val to Glu 1 to 3 Unknown R632W Arg to Trp 4 to 1 Unknown A80T Ala to Thr 1 to 2 Unknown Arg to Gln 4 to 2 R741Q Indian Arg to Trp 4 to 1 R747W Pakistani R635Q + Arg to Gln to X 4 to 2 to X Q452X Japanese Arg to Gln + Phe 4 to 2 + 1 to 1 R635Q + F72L to Leu Japanese

77

Q452X Gln to X 2 to X Unknown F72L Phe to Leu 1 to 1 Unknown R37X Arg to X 4 to X Unknown Asp to Tyr 3 to 2 D331Y Chinese Table 31: summary of the mutations with their amino acid change and damaging, according to their properties, and the geographic origin of each mutation of EOPD.

Out of these thirteen, none has been found as ancestral sequence in any of the ten animals we have chosen to make the alignments (without counting humans). However, there are some changes in the ancestral sequences in the position of the mutation that result in a synonymous change in the amino acid. We have found those changes in the position of the following mutations in the following animals:

ANIMALS CODON (DNA) AMINO ACID AMINO ACID Y790X GROUP HUMAN TAG STOP STOP MUTATION HUMAN TAT Tyr 2 PIG TAC Tyr 2 Table 32: Summary of the animals that present a different codon from humans in the codon affected by the SNP mutation in humans and its impact in the amino acid translation.

In this case, humans present a SNP causing a change from Tyrosine (Tyr) to a STOP codon. This means that the translation of the protein stops, therefore it causes a major change in the structure of the protein that leads to PD. On the other hand, pigs have a different codon than humans but that translates into the same amino acid so it does not affect the protein.

ANIMALS CODON (DNA) AMINO ACID AMINO ACID R632W GROUP HUMAN TGG Trp 1 MUTATION HUMAN CGG Arg 4 MOUSE CGT Arg 4 Table 33: Summary of the animals that present a different codon from humans in the codon affected by the SNP mutation in humans and its impact in the amino acid translation.

78

Tryptophan (Trp) is a non-polar amino acid and arginine (Arg) is a basic amino acid. These differences between these amino acids may cause a major change in the protein that leads to PD. Mice have a different codon but that translates into the same amino acid so it does not cause PD.

AMINALS CODON (DNA) AMINO ACID AMINO ACID R747W GROUP HUMAN TGG Trp 1 MUTATION HUMAN CGG Arg 4 MOUSE CGA Arg 4 Table 34: Summary of the animals that present a different codon from humans in the codon affected by the SNP mutation in humans and its impact in the amino acid translation.

Tryptophan (Trp) is a non-polar amino acid and arginine (Arg) is a basic amino acid. These differences between these amino acids may cause a major change in the protein that leads to PD. Mice have a different codon but that translates into the same amino acid so it does not cause PD.

ANIMALS CODON (DNA) AMINO ACID AMINO ACID R635Q GROUP HUMAN CAA Gln 2 MUTATION HUMAN CGA Arg 4 MOUSE CGG Arg 4 PIG CGG Arg 4 Table 35: Summary of the animals that present a different codon from humans in the codon affected by the SNP mutation in humans and its impact in the amino acid translation.

Glutamine (Gln) is a polar amino acid and arginine (Arg) is a basic amino acid. This difference in their structure and properties may be the cause of the major change in the protein that leads to PD. Mice and pigs present a different codon that translates into the same amino acid so it does not cause PD.

79

ANIMALS CODON (DNA) AMINO ACID AMINO ACID R37X GROUP

HUMAN TGA STOP STOP MUTATION HUMAN CGA Arg 4 MOUSE CGT Arg 4 HORSE CGG Arg 4 PIG CGG Arg 4 MARMOSET CGG Arg 4

Table 36: Summary of the animals that present a different codon from humans in the codon affected by the SNP mutation in humans and its impact in the amino acid translation.

In this case, humans present a SNP causing a change from arginine (Arg) to a STOP codon. This means that the translation of the protein stops, therefore it causes a major change in the structure of the protein that leads to EOPD. On the other hand, these four species have a different codon than humans but that translates into the same amino acid so it does not affect the protein.

In conclusion, all the animals’ codons studied from PLA2G6 that differ from the respective ones humans have result in a synonymous translation of the amino acid. Therefore, the protein’s structure and function is not altered in any case by these codons.

4.2.3 PINK1 gene

We have found that there are 10 SNP mutations in PINK1 that cause early onset PD. These mutations are:

MUTATION AMINO ACID DAMAGING GEOGRAPHIC CHANGE (amino acid ORIGIN properties) Gly to Asp 2 to 3 G309D Spanish Trp to X 1 to X W437X Italian Arg to X 4 to X Japanese and

R246X Israeli

80

His to Gln 4 to 2 H271Q Japanese Leu to Pro 1 to 1 L347P Filipino R279H Arg to His 4 to 4 Italian / Korean Thr to Met 2 to 1 T313M Saudi Arabian Ala to Asp 1 to 3 A217D Sudanese Gln to X 2 to X Q456X German / Tunisian Pro to Leu 1 to 1 P399L Chinese Table 37: summary of the mutations with their amino acid change and damaging, according to their properties, and the geographic origin of each mutation of EOPD.

Out of these ten, none has been found as ancestral sequence in any of the ten animals we have chosen to make the alignments (without counting humans). However, there are some changes in the ancestral sequences in the position of the mutation that result in a synonymous change in the amino acid. We have found those changes in the position of the following mutations in the following animals:

ANIMALS CODON (DNA) AMINO ACID AMINO ACID R246X GROUP

HUMAN TGA STOP STOP MUTATION HUMAN CGA Arg 4 MOUSE CGC Arg 4 PIG CGG Arg 4

Table 38: Summary of the animals that present a different codon from humans in the codon affected by the SNP mutation in humans and its impact in the amino acid translation.

Arginine (Arg) is a basic amino acid which, with a SNP, turns into a STOP codon. When human suffer the R246X mutation they develop EOPD. When we look at the genomic alignment of this gene, we see that the mouse and the pig, in spite of presenting different codons than humans, these codify for the same amino acid (Arg), implying that the protein structure and function must not change.

81

ANIMALS CODON (DNA) AMINO ACID AMINO ACID T313M GROUP

HUMAN ATG Met 1 MUTATION HUMAN ACG Thr 2 MOUSE ACA Thr 2

PIG ACA Thr 2

Table 39: Summary of the animals that present a different codon from humans in the codon affected by the SNP mutation in humans and its impact in the amino acid translation.

Methionine (Met) is an amino acid presenting non-polar characteristics. By contrast, amino acid threonine (Thr) is a polar amino acid; when humans suffer the T313M mutation, they develop EOPD; this may indicate that the change in amino acid characteristics affects the normal protein negatively. When we perform the genomic alignment, we see that the mouse and the pig present different codons than human. However, the resulting amino acid is the same as the human one (Thr).

ANIMALS CODON (DNA) AMINO ACID AMINO ACID R279H GROUP

HUMAN CAC His 4 MUTATION HUMAN CGC Arg 4

GORILLA CGT Arg 4 PIG CGT Arg 4

Table 40: Summary of the animals that present a different codon from humans in the codon affected by the SNP mutation in humans and its impact in the amino acid translation.

Arginine (Arg) and histidine (His) are basic amino acids. However, when someone has the R279H SNP he or she develops PD, indicating that the amino acid change, despite maintaining the amino acid group, is enough to negatively affect the protein. When we perform the genomic alignment we see that gorillas and pigs, in spite of presenting different codons, the resulting amino acid is the same as human (Arg).

82

ANIMALS CODON (DNA) AMINO ACID AMINO ACID Q456X CHANGE

HUMAN TAG STOP STOP MUTATION HUMAN CAG Gln 2 MOUSE CAA Gln 2

PIG CAA Gln 2

Table 41: Summary of the animals that present a different codon from humans in the codon affected by the SNP mutation in humans and its impact in the amino acid translation.

Glutamine (Gln) is an amino acid which presents polar characteristics which, with a SNP, turns into a STOP codon. When we perform the genomic alignment we see that mice and pigs present a different sequence. However, the resulting amino acid is the same as in the ancestral human codon (Gln).

ANIMALS CODON (DNA) AMINO ACID AMINO ACID P399L CHANGE HUMAN CTC Leu 1 MUTATION HUMAN CCC Pro 1 MOUSE CCT Pro 1 PIG CCT Pro 1

Table 42: Summary of the animals that present a different codon from humans in the codon affected by the SNP mutation in humans and its impact in the amino acid translation.

Leucine and proline are non-polar amino acids. However, when humans suffer the P399L mutation they develop EOPD. When we perform the genomic alignment, we see that mouse and pig present different codons than human but the resulting amino acid is the same (Pro).

4.2.4 VPS35 gene

We have found that there is only one SNP mutation that causes early onset PD in VPS35. This mutation and its characteristics are:

83

MUTATION AMINO ACID DAMAGING GEOGRAPHIC CHANGE (amino acid ORIGIN properties) D620N Asp to Asn 3 to 2 Caucasian (ethnic origin)/Germany (geogrpahic origin) Table 43: summary of the mutations with their amino acid change and damaging, according to their properties, and the geographic origin of each mutation of EOPD.

When we developed the animals’ alignments we saw that all the animals’ codons corresponded to the ancestral humans’ sequences.

4.2.5 SNCA gene

We have found that there are 6 SNP mutations in SNCA that cause early onset PD. These mutations are:

MUTATION AMINO ACID DAMAGING GEOGRAPHIC CHANGE (amino acid ORIGIN properties) Ala to Thr 1 to 2 A53T Italian / Greek Ala to Pro 1 to 1 A30P German Glu to Lys 3 to 4 Spanish (Basque

E46K Country) Gly to Asp 2 to 3 G51D French His to Gln 4 to 2 H50Q Caucasian English Table 44: summary of the mutations with their amino acid change and damaging, according to their properties, and the geographic origin of each mutation of EOPD.

Out of these six, one has been found as ancestral sequence in two of the ten animals we have chosen to make the alignments (without counting humans). In addition, there are some changes in the ancestral sequences in the position of the mutation that result in a synonymous change in the amino acid. We have found those changes in the position of the following mutations in the following animals:

84

ANIMALS CODON (DNA) AMINO ACID AMINO ACID A53T GROUP

HUMAN ACA Thr 2 MUTATION

HUMAN GCA Ala 1

MARMOSET ACA Thr 2

HORSE ACA Thr 2

Table 45: Summary of the animals that present a different codon from humans in the codon affected by the SNP mutation in humans and its impact in the amino acid translation.

The nucleotide change leads to the transcription of a different amino acid belonging to a different amino acid group. In this case alanine, non-polar, is replaced by threonine (Thr), polar. This alters the protein causing PD. The marmoset and the horse also present a different nucletotide in the codon when comparing with humans, which particularly corresponds to the one of the human mendelian mutation. Therefore, their codon translates into threonine (Thr), too. In humans, this leads to PD, but we did not know what happens with these animals. As this was out of our knowledge because we did not find any literature or references which could help us to interpret this result, we asked some experts in the subject for advice to see if they could answer our questions.

ANIMALS CODON (DNA) AMINO ACID AMINO ACID E46K GROUP

HUMAN AAG Lys 4 MUTATION

HUMAN GAG Glu 3

HORSE GAA Glu 3

Table 46: Summary of the animals that present a different codon from humans in the codon affected by the SNP mutation in humans and its impact in the amino acid translation.

Glutamic acid (Glu) and lysine (Lys) belong to different amino acid groups, so our first guess is that the replacement for one another should have a major impact in the protein, and that is exactly what happens: the replacement of glutamic acid by

85 lysine causes PD. As the different nucleotide that horses present in the codon is synonymous, the protein remains intact.

ANIMALS CODON (DNA) AMINO ACID AMINO ACID GROUP H50Q HUMAN CAG Gln 2 MUTATION

HUMAN CAT His 4

VERVET-AGM CAC His 4

MACAQUE CAC His 4

OLIVE-BABOON CAC His 4

HORSE CAC His 4

Table 47: Summary of the animals that present a different codon from humans in the codon affected by the SNP mutation in humans and its impact in the amino acid translation.

Glutamine (Gln), polar, and histidine (His), basic, are amino acids from different chemical groups. If the transcription is altered, resulting in glutamine instead of histidine, the person will suffer from PD. The vervet-AGM, the macaque, the olive baboon and the horse present a different nucleotide in the codon affected by the mutation but, as its translation is synonymous, the protein is not altered.

4.2.6 ATP13A2 gene

We have found that there are 3 SNP mutations in ATP13A2 that can cause early onset PD. These mutations and their characteristics are:

MUTATION AMINO ACID DAMAGING GEOGRAPHIC CHANGE (amino acid ORIGIN properties)

G504R Gly to Arg 2 to 4 Brazilian

M810R Met to Arg 1 to 4 Belgian

G877R Gly to Arg 2 to 4 Italian

Table 48: summary of the mutations with their amino acid change and damaging, according to their properties, and the geographic origin of each mutation of EOPD.

86

Out of these three, none has been found as ancestral sequence in any of the ten animals we have chosen to make the alignments (without counting humans). However, there are some changes in the ancestral sequences in the position of the mutation that result in a synonymous change in the amino acid. We have found those changes in the position of the following mutations in the following animals:

ANIMALS CODON (DNA) AMINO ACID AMINO ACID GROUP G504R

HUMAN CGG Arg 4 MUTATION

HUMAN GGG Gly 2 MACAQUE GGC Gly 2

Table 49: Summary of the animals that present a different codon from humans in the codon affected by the SNP mutation in humans and its impact in the amino acid translation.

In this case, when a human suffers the G504R SNP the amino acid glycine (polar) is replaced by arginine (basic); this change will lead to the development of PD.

When we do the alignment, we see that macaques have a different codon than humans, but the resulting amino acid is the same (Gly).

ANIMALS CODON (DNA) AMINO ACID AMINO ACID G877R GROUP

HUMAN AGA Arg 4 MUTATION HUMAN GGA Gly 2 MOUSE GGG Gly 2

Table 50: Summary of the animals that present a different codon from humans in the codon affected by the SNP mutation in humans and its impact in the amino acid translation.

While glycine (Gly) is a polar amino acid, arginine (Arg) is basic one. When humans suffer the G877R mutation they develop PD because the structure of protein ATP13A2 is changed due to the amino acid change.

By doing the genomic alignment we see that mice, althought having a different codon, present the same amino acid as in the normal human sequence. Therefore, the protein is not altered.

87

4.2.7 FBXO7 gene

We have found that there are 3 SNP mutations in FBXO7 that can cause early onset PD. These mutations and their characteristics are:

MUTATION AMINO ACID DAMAGING GEOGRAPHIC CHANGE (amino acid ORIGIN properties)

R378G Arg to Gly 4 to 2 Iranian

R498X Arg to Ter 4 to X Italian

T22M Thr to Met 2 to 1 Dutch

Table 51: summary of the mutations with their amino acid change and damaging, according to their properties, and the geographic origin of each mutation of EOPD.

Out of these three, none has been found as ancestral sequence in any of the ten animals we have chosen to make the alignments (without counting humans). However, there are some changes in the ancestral sequences in the position of the mutation. Some of them result in a synonymous change in the amino acid, but others do not. We have found those changes in the position of the following mutations in the following animals:

ANIMALS CODON (DNA) AMINO ACID AMINO ACID R378G GROUP

HUMAN GGT Gly 2 MUTATION

HUMAN CGT Arg 4

MOUSE CGG Arg 4

PIG CGG Arg 4

HORSE CGG Arg 4

Table 52: Summary of the animals that present a different codon from humans in the codon affected by the SNP mutation in humans and its impact in the amino acid translation.

88

Arginine (Arg) and glycine (Gly) have really different chemical properties: while arginine is basic and considerably big, glycine is a polar amino acid and the smallest one of all of them, as it only presents an hydrogen in the radical that differentiates them all. Corresponding to our hypothesis, the disease, in this case PD, must be caused by a major change in the amino acid sequence, which would be explained by the change from Arg to Gly. Although mice, pigs and horses present a different nucleotide in the codon affected by the mutation in humans, the change is synonymous so there is no alteration in the protein.

ANIMALS CODON (DNA) AMINO ACID AMINO ACID R498X GROUP

HUMAN TGA STOP STOP MUTATION

HUMAN CGA Arg 4

MOUSE AGA Arg 4

Table 53: Summary of the animals that present a different codon from humans in the codon affected by the SNP mutation in humans and its impact in the amino acid translation.

In this case, the SNP in the codon leads to a stop codon, which results in the ending of the protein. This major change causes PD. Mice’s codons present a different nucleotide than the human ones, but its translation results in a synonymous change, so the protein is not affected.

ANIMALS CODON (DNA) AMINO ACID AMINO ACID T22M GROUP

HUMAN ATG Met 1 MUTATION

HUMAN ACG Thr 2

PIG ACC Thr 2

HORSE ATA Ile 1

Table 54: summary of the mutations with their amino acid change and damaging, according to their properties, and the geographic origin of each mutation of EOPD.

89

In this case, the SNP translates into methionine (Met), non-polar, instead of threonine (Thr), polar, leading to PD. Pigs’ and horses’ codons are slightly different. In the case of pigs, the change is synonymous. For horses, however, it translates into isoleucine (Ile), which is also a non-polar amino acid. Having found ourselves in this situation, we had to discuss with an expert the possible consequences of this change.

4.2.8 LRRK2 gene

We have found that there are 7 SNP mutations in LRRK2 that cause early onset PD. These mutations and their characteristics are:

MUTATION AMINO ACID DAMAGING GEOGRAPHIC CHANGE (amino acid ORIGIN properties) R1441G Arg to Gly 4 to 2 Catalan / Basque Y1699C Tyr to Cys 2 to 2 Portugal / Brazil R1441C Arg to Cys 4 to 2 Unknown I1122V Ile to Val 1 to 1 Unknown G2019S Gly to Ser 2 to 2 France / Belgium / Portugal / Netherlands / Algeria / Morocco / Tunisia I2020T Ile to Thr 1 to 2 European R1441H Arg to His 4 to 4 Taiwanese Table 55: summary of the mutations with their amino acid change and damaging, according to their properties, and the geographic origin of each mutation of EOPD.

Out of these seven, one has been found as ancestral sequence in one of the ten animals we have chosen to make the alignments (without counting humans). In addition, there are some changes in the ancestral sequences in the position of the mutation. Some of them result in a synonymous change in the amino acid, but others do not. We have found those changes in the position of the following mutations in the following animals:

90

ANIMALS CODON (DNA) AMINO ACID AMINO ACID R1441G GROUP HUMAN GGC Gly 2 MUTATION HUMAN CGC Arg 4 MARMOSET CGT Arg 4 MOUSE CGT Arg 4 PIG CGT Arg 4 HORSE CGT Arg 4 Table 56: Summary of the animals that present a different codon from humans in the codon affected by the SNP mutation in humans and its impact in the amino acid translation.

Glycine (Gly) is a polar amino acid and arginine (Arg) is a basic amino acid. The different properties of these amino acids may be what causes a major change in the protein, which leads to EOPD. Mice, pigs and horses have a different codon but which translates into the same amino acid so it does not cause PD.

ANIMALS CODON (DNA) AMINO ACID AMINO ACID Y1699C GROUP

HUMAN TGT Cys 2 MUTATION MUTATION TAT Tyr 2 MOUSE TAC Tyr 2

Table 57: Summary of the animals that present a different codon from humans in the codon affected by the SNP mutation in humans and its impact in the amino acid translation.

Although both cysteine (Cys) and tyrosine (Tyr) are polar amino acids, the differences between them must cause a major change in the structure of the protein, as this mutation causes PD in humans. On the other hand, mice have a different codon but that does not cause a change in the amino acid and therefore, in the protein.

91

ANIMALS CODON (DNA) AMINO ACID AMINO ACID R1441C GROUP

HUMAN TGC Cys 2 MUTATION HUMAN CGC Arg 4 MARMOSET CGT Arg 4 MOUSE CGT Arg 4 PIG CGT Arg 4 HORSE CGT Arg 4 Table 58: Summary of the animals that present a different codon from humans in the codon affected by the SNP mutation in humans and its impact in the amino acid translation.

Cysteine (Cys) is a polar amino acid and arginine (Arg) is a basic amino acid. These differences in their structure may be the cause of a major change in the protein, which leads to PD. The other four animal species shown in the table present a different codon but that does not affect the amino acid.

ANIMALS CODON (DNA) AMINO ACID AMINO ACID R1441H GROUP HUMAN CAC His 4 MUTATION HUMAN CGC Arg 4 MARMOSET CGT Arg 4 MOUSE CGT Arg 4 PIG CGT Arg 4 HORSE CGT Arg 4 Table 59: Summary of the animals that present a different codon from humans in the codon affected by the SNP mutation in humans and its impact in the amino acid translation.

Histidine (His) is a basic amino acid and arginine (Arg) as well. However, their structure is slightly different, as arginine presents a carbonate chain including some nitrogens and, on the other hand, histidine has a cycle on its differentiative group. These differences in their structure may cause a major change in the protein, leading to PD. These four species present a different codon but that does not affect the final amino acid.

92

ANIMALS CODON (DNA) AMINO ACID AMINO ACID I1122V GROUP

HUMAN GTA Val 1 MUTATION HUMAN ATA Ile 1 MOUSE ATT Ile 1 PIG GTA Val 1 HORSE CTA Leu 1 Table 60: Summary of the animals that present a different codon from humans in the codon affected by the SNP mutation in humans and its impact in the amino acid translation.

Although both valine (Val) and isoleucine (Ile) are non-polar amino acids, the differences between them must cause a major change in the protein, because the mutation leads to PD. Mice have a different codon but that does not affect the amino acid and it is the same as humans. On the other hand, pigs present a codon that codifies for the same amino acid as the human mendelian mutation. This was also difficult for us to interpret, because of the lack of references on the subject, so we had to consult some experts. Horses also have a different codon which codifies for leucine (Leu), a non-polar amino acid, so our guess is that this may not alter the phenotype of the animals.

4.2.9 PARK2 gene

We have found that there are 11 SNP mutations in PARK2 that cause early onset PD. These mutations and their characteristics are:

MUTATION AMINO ACID DAMAGING GEOGRAPHIC CHANGE (amino acid ORIGIN properties)

T240R Thr to Arg 2 to 4 Colombian

Q311X Gln to Ter 2 to X Turkish

W453X Trp to Ter 1 to X European

K161N Lys to Asn 4 to 2 European

A82Q Ala to Gln 1 to 2 Unknown

C212Y Cys to Tyr 2 to 2 Colombian

93

V56E Val to Glu 1 to 3 Unknown

R275W Arg to Trp 4 to 1 Italian

K211N Lys to Asn 4 to 2 Italian

T240M Thr to Met 2 to 1 Unknown

C431F Cys to Phe 2 to 1 Japanese

Table 61: summary of the mutations with their amino acid change and damaging, according to their properties, and the geographic origin of each mutation of EOPD.

Out of these eleven, none has been found as ancestral sequence in any of the ten animals we have chosen to make the alignments (without counting humans). However, there are some changes in the ancestral sequences in the position of the mutation. Some of them result in a synonymous change in the amino acid, but others do not. We have found those changes in the position of the following mutations in the following animals:

ANIMALS CODON (DNA) AMINO ACID AMINO ACID K161N GROUP

HUMAN AAT Asn 2 MUTATION HUMAN AAA Lys 4 PIG AAG Lys 4 Table 62: Summary of the animals that present a different codon from humans in the codon affected by the SNP mutation in humans and its impact in the amino acid translation.

Asparigine (Asn) is a polar amino acid and lysine (Lys) a basic amino acid. Because they have different properties, we suppose that the resulting protein will be different from the original; and that might explain why humans suffering the K161N mutation will develop PD.

When we perform the genomic alignment, we see that pigs have a different codon than humans at the same genomic coordinate. However, the resulting amino acid is the same as in humans.

94

ANIMALS CODON (DNA) AMINO ACID AMINO ACID A82Q GROUP

HUMAN GAA Glu 2 MUTATION HUMAN GCA Ala 1 PIG ACA Thr 2

Table 63: Summary of the animals that present a different codon from humans in the codon affected by the SNP mutation in humans and its impact in the amino acid translation.

Alanine (Ala) is a non-polar amino acid and glutamine (Gln) and threonine (Thr) are polar amino acids. When someone suffers the A82Q SNP, the structure of the new protein changes; that may be because instead of translating into a non-polar amino acid it translates into a polar one, which causes enough change as to make the person develop PD. When we perform the genomic alignment, we see that pigs present a polar amino acid (Thr) but we suppose that they do not develop PD as, otherwise, we would be stating that a whole specie suffers a disease.

4.3 Quantification of synonymous and non-synonymous changes

As we can see, most of the changes we have presented are synonymous, which brings us back to the redundancy of the genetic code: there are a lot of cases in which if only the third nucleotide is changed, the translation is synonymous.

In the following table, the differences that the animals’ sequences present are shown, arranged by genes and indicating in which nucleotide the difference is found. As we can see, most of them are found in the third nucleotide, confirming all the synonymous changes we found.

33 1st nucleotide 2nd nucleotide 3rd nucleotide APP 14 PSEN1 21 PSEN2 2 DJ1 2 2 PLA2G6 9

33 Those numbers in bold represent a change in the nucleotide that codifies for a different amino acid. Those in red codify for the amino acid responsable for the development of EOAD and EOPD in humans. The ones in blue codify for a different amino acid.

95

PINK1 10 VPS35 SNCA 2 5 ATP13A2 2 FBXO734 1 1 3+1 LRRK2 1+1 5 PARK2 1 1 TOTAL 8 1 75 Table 64: Summary of the nucleotide’s change position in the codon of the different species.

On the table below, there is a summary of the position of the nucleotide’s change of the SNP mutation that leads to EOAD and EOPD in humans, separated by genes. As we can see, there are more mutations in the first and second nucleotide, matching our guesses, as changes in the third one are usually related to synonymous changes.

1st nucleotide 2nd nucleotide 3rd nucleotide APP 3 2 PSEN1 4 9 PSEN2 2 DJ1 1 1 PLA2G6 3 1 1 PINK1 2 3 VPS35 SNCA 2 1 ATP13A2 2 FBXO7 2 1 LRRK2 3 2 PARK2 1 1 TOTAL 18 23 6 Table 65: Summary of the nucleotide’s change position in the codon of the SNP mutations in humans causing EOAD or EOPD.

34 The horse presents a change of two nucleotides, the second one and the third.

96

4.4 dN/dS ratio

These tables show the average of the ratio of non-synonymous changes to synonymous ones of the species we chose to make the alignments with in the corresponding genes that determine EOAD and EOPD. This ratio is calculated dividing the dN by dS.

APP PSEN1 PSEN2

Chimpanzee 0.09722 n/a 0.03401

Gorilla n/a n/a n/a

Horse 0.03510 0.16966 0.03077

Macaque 0.23006 n/a 0.00591

Marmoset 0.04793 0.23714 0.01336

Mouse 0.02260 0.07131 0.02386

Olive baboon 0.06033 0.50427 0.00550

Orangutan 0.02341 0.04977 0.01192

Pig 0.02484 0.12670 0.01781

Vervet-AGM 0.03071 0.02016 0.00570 Table 66: dN/dS ratio of the proteins involved in EOAD from different species

97

DJ1 PLA2G ATP13A FBXO7 LRRK2 PARK2 PINK1 SNCA VPS35 (PARK7 6 2 )

Chimpanze 1. 0.4382 0.4242 0.1453 0.10000 n/a n/a 0.33640 0.16463 e 17391 0 4 0

Gorilla n/a n/a n/a n/a n/a n/a n/a n/a

0.3002 0.1629 0.2099 0.1865 0.1405 0.0113 Horse 0.11176 0.10765 0.04413 1 1 7 0 6 2 0.4684 0.2366 0.2970 0.1635 0.3516 0.2870 Macaque 0.25632 0.18012 0.54053 3 5 6 5 5 9 0.5730 0.2056 0.1645 0.1764 0.5176 0.0071 Marmoset 0.23536 0.11542 0.03717 5 9 6 7 1 7 0.3700 0.1129 0.1291 0.1901 0.0987 0.0073 Mouse 0.06839 0.11539 0.06730 3 1 7 6 8 7

Olive 0.5502 0.4549 0.2412 0.1526 0.2465 0.0139 0.23652 0.21452 n/a baboon 4 9 6 0 8 5

0.6984 0.3167 0.1181 0.1465 0.0604 Orangutan 0.08538 n/a 0.19678 0.14773 1 1 6 3 0

0.3276 0.1630 0.2320 0.1317 0.0468 0.0088 Pig 0.09590 0.12970 0.04053 0 9 8 7 4 0

0.4694 0.2733 0.2053 0.1545 0.1528 Vervet-AGM 0.24959 n/a 0.14917 0.04887 7 7 2 2 4

Table 67: dN/dS ratio of the proteins involved in EOPD from different species.

Regarding the table we built, we see that all the dN/dS are inferior to 1, which is the same as saying that our genes seem to be selected in a negative way, except for one, the chimpanzees FBXO7, which is above 1. According to our guesses, it should be inferior to 1, so, as it is not our field of expertize, we do not know how to explain why this is so.

Negative selection is also known as purifying selection or stabilizing selection. It is a form of natural selection that is responsible for the preservation of the adaptive characteristics of organisms under constant environmental conditions. According to some researchers, this can lead to an increase of the genetic diversity of a population.35

35 This was stated by I. I. Shmal’gauzen. For more information, go to http://encyclopedia2.thefreedictionary.com/Purifying+selection

98

4.5 Number of mutations and phylogenetic distance

The following table shows the number of differences in the affected codon that each species has.

DIFFERENT CODONS THAN ANIMALS HUMANS (in the region studied) CHIMPANZEE 1 GORILLA 2 ORANGUTAN 0 VERVET-AGM 3 MACAQUE 5 OLIVE BABOON 4 MARMOSET 8 MOUSE 26 PIG 26 HORSE 20 Table 68: Number of codons that present a difference respect the human ones in each species.

This table gives us an idea of which specie is closer to humans in evolution. However, it is not a reliable way to do it, as it only takes into account those codons in which humans have a mutation leading to EOAD or EOPD.

That is why we also built the following table36, summarizing in which chromosome each gene is found in the different species.

36 The VPS35 gene is not found as these tables were built according to the alignment tables, and we could not find the alignment of the mentioned gene.

99

APP PSEN1 PSEN2 HUMAN 21 14 1 CHIMPANZEE 21 14 1 GORILLA 21 14 1 ORANGUTAN 14 1 VERVET-AGM 2 24 25 MACAQUE 3 7 1 OLIVE-BABOON 3 7 1 MARMOSET 21 10 MOUSE 16 12 1 PIG 13 7 10 HORSE 26 24 30 Table 69: Location of the genes with SNPs involved in EOAD. The number shown corresponds to the chromosome in which these genes are found.

ATP13A2 DJ1 FBXO7 LRRK2 PARK2 PINK1 PLA2G6 SNCA

HUMAN 1 1 22 12 6 1 22 4

CHIMPANZEE 1 22 12 6 1 22 4

GORILLA 1 1 22 12 6 1 22 4

ORANGUTAN 1 1 22 12 6 1 22 4

VERVET-AGM 20 10 19 11 13 20 19 7

MACAQUE 1 1 10 11 4 1 10 5

OLIVE-BABOON 1 1 10 11 4 1 10 5

MARMOSET 7 1 9 4 1 3

MOUSE 4 4 10 15 17 4 15

PIG 6 5 5 1 6 5

HORSE 2 28 6 31 2 28 3 Table 70: Location of the genes with SNPs involved in EOPD in the genome. The number shown corresponds to the chromosome in which these genes are found.

As they show, the primates phylogenetically closer to humans are chimpanzee, gorilla and orangutan, and they present the same studied gene in the same chromosome. On the other hand, we can also see that macaque and olive baboon have the same genes in the same chromosome, so we suppose that the evolutionary time between them is short. Once more, we see that mice, horses and pigs are the ones showing less resemblance to the .

In order to verify the evolutionary time, we looked for a phylogenetic tree that included all the species we chose. After comparing it with our results, we found that

100 it approximately matches our predictions, as mice, horses and pigs are the ones further away in the phylogenetic tree from us. The reason why it is not exactly identical is that the region we compared is just the few codons involved in EOAD and EOPD, not whole genomes.

Image 9: Phylogenetic tree of several species.37

37 It has been extracted from http://genome.cshlp.org/content/15/7/998/F4.large.jpg

101

5- Conclusions

1) The first conclusion we can formulate after doing this research project is that some species present, in their ancestral sequence, the same amino acid as the amino acid resulting by the SNPs that leads 100% to the development of EOFAD or EOFPD in humans. The main difference is that they do not seem to suffer AD nor PD as, otherwise, we would be stating that a whole species suffers the disease.

At the beginning of this project we asked ourselves whether or not animals suffer EOPD or EOAD. We did some research and asked specialists to answer this question and eventually we could not find any case of EOAD or EAPD in animals.

On the other hand, it is true that animal models are used to study these two diseases, but after talking to specialists we learnt that despite being similar the phenotype they present is not the same as humans.

2) The second conclusion is that not all the SNPs found result in the codification of an amino acid with very different characteristics from the ancestral one as we predicted. When we analyzed the results we saw that not all the single nucleotide mutations led to an amino acid from a different group, meaning that the change of the nature of the amino acid appears not to be crucial to imply a damaging in the protein which necessarily leads to the development of AD or PD.

Additionally, when we aligned all sequences, we saw that, although some species presented a different nucleotide than human in a particular region, the resulting amino acid was the same. Therefore, we can affirm that the genetic code is redundant in the way that multiple amino acid can be coded by more than one codon (synonymous codons); that is why many changes found are synonymous.

Once we have learnt that, we propose the following hypotheses: we believed that what is crucial to the development of both diseases is the part of the protein which is altered. For instance, depending on the altered part, the structure of the protein may suffer a major damage as could be losing its tertiary structure. In other words, we think that these diseases develop

102

when one or more SNP occur in a region of the DNA that codifies for the amino acids which form an important part of the protein, and that the nature of the resulting amino acid is not what is determinant.

3) On the other hand, we can conclude that animals which are phylogenetically closer to humans present less differences in their ancestral sequences when compared to humans. By contrast, horses, pigs and mice (the species further away from humans in the phylogenetic tree) are the ones with more differences in their sequence of DNA to humans. That shows the evolutionary time between species, meaning that ones with greater evolutionary time are the ones which are further away from humans in the phylogenetic tree.

4) The fourth conclusion refers to the dN/dS ratio. When we analyze them, we see that they are all less than one. That is why we can conclude that all the SNPs we have studied are in negative selection, so the genotype with the mutation is not favoured over the ancestral.

103

6- Future research

Taking our conclusions into consideration, we realize that more questions may be asked at this point of the research, which may lead to some new research. These questions are the following:

1) Why may a concrete amino acid in a certain region of a protein cause so much damage to humans but not to some species?

2) Is it possible that in the case of AD these species present a defense mechanism against amyloid plaques so although they produce these plaques they have a natural way to destroy them so they do not develop AD?

3) In which way is the structure of the protein altered when the SNPs we have studied occur?

4) Why has no case of animals suffering EOAD or EAPD been found?

As we can see, so many questions arise which may be answered. We propose a future research to go one step forward and try to answer these questions so we can make further contributions to the study of some of the genetic causes of these two diseases.

On our meeting with Dr. Prous38, we reached to some conclusions about the future research: we have shown that the genetic features of both EOAD and EOPD in humans can be observed in a variety of animal species. In order to explore potential therapeutic interventions, these findings should be complemented with a study of the physiopathology and phenotypic aspects of the diseases. In this regard, it would be necessary to investigate further if β-amyloid deposits, neurofibrillary tangles or α-synuclein aggregates are found in the different animal species analyzed in this research project. Likewise, additional research should be carried out to determine if behavioral, cognition or psychomotor abnormalities, among others, are encountered in these different animal species.

38 He has a PhD in chemistry and he is vicepresident of the Prous Institute for Biomedical Research. For further information about him, go to: http://www.haxel.com/icic/speakers/josep-prous

104

7- Glossary

Allelic variant: an alteration in the normal sequence of a gene, the significance of which is often unclear until further study of the genotype and corresponding phenotype occurs in a sufficiently large population. A complete gene sequencing often presents numerous allelic variants (sometimes hundreds) for a given gene.

Apoptosis: is the process of programmed cell death that may occur in multicellular organisms.

Black substance: a dark layer of gray matter separating the tegmentum of the midbrain from the crus cerebri.

cDNA: in genetics, a DNA which is complementary to a given RNA template. In bioinformatics, an mRNA transcript's sequence expressed as DNA bases rather than RNA bases (for example: GCAT instead of GCAU).

Coding strand: the strand of DNA that is transcribed into RNA and codifies for a given aa.

Damaging: it refers to the impact of an amino acid substitution (change) on the structure and function of a human protein.

Dn/ds: the ratio of the number of non-synonymous substitutions to the number of synonymous substitutions which can be used as an indicator of the selective pressure acting on a protein-coding gene.

Evolution: the change in the heritable traits of biological populations over successive generations.

Flanking region: it's the region of DNA that is not transcribed into RNA.

Gene: The fundamental physical and functional unit of heredity. A gene is an ordered sequence of nucleotides located in a particular position on a particular chromosome, that encodes a specific functional product (i.e., a protein or RNA molecule).

105

Genomics: an area within genetics that concerns the sequencing and analysis of an organism’s genome. Comparing the genome of different species enables us to do many studies regarding evolutionary biology.

Genomic alignment: comparison of the sequence of different species according to their genetic information.

Glial cells: non-neuronal cells that maintain homeostasis, form myelin, and provide support and protection for neurons in the brain and peripheral nervous system.

(Hyper)phosphorylation: addition of a phosphate group to a protein or other organic molecule.

Lewy Bodies: aggregates of protein that develop inside nerve cells in PD.

Missense mutation: when the change of a single causes the substitution of a different amino acid in the resulting protein. This amino acid may have no effect, or it may render the protein nonfunctional. mRNA or messenger RNA: a single-stranded RNA molecule that is complementary to one of the DNA strands of a gene. The mRNA is an RNA version of the gene that leaves the cell nucleus and moves to the cytoplasm where proteins are made. During protein synthesis, an called a ribosome moves along the mRNA, reads its base sequence, and uses the genetic code to translate each three-base triplet, or codon, into its corresponding amino acid.

Mucopolysaccharides: large strands of sugar molecules found along all the organism.

Natural selection: it's a key mechanism of evolution. It's the process by which species with a certain characteristics have a greater survival or reproductive rate that other individuals in a population and pass on these inheritable genetic characteristics to their offspring.

106

Negative or purifying selection: a mode of natural selection, it's the selective removal of alleles that are deleterious.

Neutral selection: it's a theory that states that if a population carries different alleles of a particular gene, it means that each of those alleles is equally goo at performing its job.

Non-coding strand: the complimentary strand of the coding one. In order to transcribe it into RNA, you only have to change the T into U.

Nonsense mutation: the substitution of a single base pair that leads to the appearance of a stop codon where previously there was a codon specifying an amino acid.

Non-synonymous mutation: a nucleotide mutation that alters the amino acid sequence of a protein.

Orthologue: a homologous gene sequences found in different species (they come from a common ancestral gene).

Oxidative stress: an imbalance between the production of free radicals and the ability of the body to counteract or detoxify their harmful effects through neutralization by antioxidants. Oxidative stress can cause base damage that can damage the cells.

Phylogenetic tree or evolutionary tree: it's a branching tree that shows the evolutionary relationships among various species based on thier similarities or differences in their genes.

Positive or directional selection: it's a mode of natural selection, it occurs when a certain allele has a greater fitness than others and that's why its frequency increases.

Proteolysis: breakdown of proteins into smaller polypeptide or amino acids.

107

RNA template: a chain of RNA that serves as a pattern for the synthesis of a chain of amino acids.

SNP: stands for single nucleotide polymorphism, it's a single variation in a genetic sequence.

Tangles: a mass of interlaced or intertwisted threads, strands, or other parts39.

Transcript: a sequence of RNA produced by transcription from a DNA template.

39 In Catalan: it would be similar to “embolics”, in our case “embolics de fibres o proteïnes”.

108

8- References

 MUNTANÉ MEDINA, G. Modificacions post-traduccionals de l’α-sinucleïna en les malalties neurodegeneratives. PhD March 2010

 SANTPERE BARÓ, G. Factors patogènics convergents en taupaties. PhD 2009.

 Alzforum (www.alzforum.org)

 OMIM (www.omim.org)

 UCSC (genome.ucsc.edu)

 Ensembl (www.ensembl.org)

 www.mayoclinic.org/diseases-conditions/alzheimers- disease/basics/causes/con-20023871

 www.nature.com/nrn/posters/ad/index.html

 en.wikipedia.org/wiki/Tau_protein

 www.ncbi.nlm.nih.gov/gene/5663

 www.ncbi.nlm.nih.gov/gene/4137

 www.alzforum.org/mutations

 ghr.nlm.nih.gov/gene/APP

 www.molgen.ua.ac.be/ADMutations/default.cfm?MT=1&ML=1&Page=MutBy Query&Query=tblContexts.GeneSymbol%20In%20%28%27APP%27%29&S election=Gene%20In%20%28APP%29

 www.molgen.ua.ac.be/ADMutations/default.cfm?MT=1&ML=6&Page=StatPer Gene

 ocw.unican.es/ciencias-de-la-salud/biogerontologia/materiales-de-clase- 1/capitulo-15.-neurodegeneracion-y-aportaciones/15.1-la-enfermedad-de- alzheimer-1

 en.wikipedia.org/wiki/Alzheimer%27s_disease#Cause

 www.webmd.com/alzheimers/guide/alzheimers-types

109

 www.brightfocus.org/alzheimers/about/risk/#heredity

 www.nia.nih.gov/alzheimers/publication/alzheimers-disease-genetics-fact- sheet

 ghr.nlm.nih.gov/condition/alzheimer-disease

 www.ncbi.nlm.nih.gov/books/NBK1236/

 ghr.nlm.nih.gov/condition/alzheimer-disease

 www.mayoclinic.org/diseases-conditions/parkinsons- disease/basics/causes/con-20028488

 en.wikipedia.org/wiki/Alpha-synuclein

 learn.genetics.utah.edu/content/variation/mutation/

 evolution.about.com/od/Overview/a/Synonymous-Vs-Nonsynonymous- Mutations.htm

 www.scienceclarified.com/Ma-Mu/Mendelian-Laws-of-Inheritance.html

 www.geneticseducation.nhs.uk/genetic-glossary/221-penetrance

 www.news-medical.net/health/What-is-Junk-DNA.aspx

 www.news-medical.net/health/What-are-introns-and-exons.aspx

 biopili.weebly.com/codi-genegravetic.html

 www..org

 ghr.nlm.nih.gov/glossary

110

9- Annexes

9.1 Geographic origin of the studied AD mutations

in which countries there have been individuals with a SNP mutation in the corresponding gene. corresponding the in mutation SNP a with individuals been have there countries which in

This map shows shows map This

: : Map 1 Map

111

9.2 Geographic origin of the studied PD mutations

in which countries there have been individuals with a SNP mutation in the corresponding gene. corresponding the in mutation SNP a with individuals been have there countries which in

map shows map

This This

: 2

Map

112

9.3 CD-rom of supplementary materials

On the CD attached the tables we built containing the information about the SNPs and the alignments are shown.

113

114