Quick viewing(Text Mode)

Molecular Genetics

Molecular Genetics

AND PATTERNS OF INHERITANCE

Simon Petersen-Jones DVetMed PhD DVOphthal MRCVS DECVO Department of Small Animal Clinical Science Michigan State University Veterinary Medical Center 736 Wilson Road, D-208 East Lansing. MI 48824 [email protected]

INTRODUCTION The genome contains a complete set of instructions to make an organism and control its cellular structures and activities. This information, or genome, is contained within tightly coiled threads of deoxyribonucleic acid (DNA) found in the nucleus of every cell. The DNA is divided into . The genome consists of about 3 billion basepairs (Mb) (dog genome consists of 2.4Mb) and disease may result from an alteration of just one of those basepairs. It is estimated that the contains in the region of 20, - 25,000 , whereas the dog is estimated to have only ~19,000 genes. Each carries the code for making a particular . Only a subset of the total number of genes will be functional in a particular cell type. Certain genes, known as house-keeping genes are expressed in all cell types and are needed for cell function. Other genes are tissue or cell-type specific. Although 20 –25,000 may seem a small number of genes to encode for and control the functions of a complex organism, further complexity if provided by a process of alternative spicing of genes. Alternative splicing between different tissues allows the same gene to produce differing dependent on the tissue in which the gene is active.

From http://www.exonhit.com/alternativesplicing/index.html - see this web page for more information on alternative splicing.

Disease states are all influenced by genetic makeup. Some conditions result primarily from genetically determined factors (e.g. gene mutations). Although the animal’s genes (genotype) may dictate that it develops a disease, the actual characteristics of the disease (disease phenotype) i.e. how the disease is expressed in that individual (such as age of onset, rate of progression) may be influenced by environmental factors. Conversely diseases that at first glance appear totally environmental, such as infections with microorganisms are in fact influenced by genetic factors (e.g. factors that confer disease resistance). The situation may be further complicated when it is appreciated that expression of even simply inherited traits can be modified by “background genetics”. Basically this means that a variation (mutation) of one gene is the cause of the disease, but DNA variations elsewhere (at a modifying locus, or loci) influence the expression of the disease trait.

1 Perhaps the modifying locus encodes a protein that interacts with the disease protein or can render the animal relatively resistant to the effects of the mutation.

DNA STRUCTURE DNA strands consist of repeating units consisting of a phosphate group, a sugar (deoxyribose for DNA) and a base (adenine, cytosine, guanine or thymine).

The DNA molecules consist of 2 strands that wrap around each other as a double helix. The two strands are joined by hydrogen bonds between the bases. Adenine (A) joins to thymine (T) and cytosine (C) to guanine (G) and thus the two strands are complementary to each other.

2 Note the different directionality of each strand – one runs from 5’ to 3’ and the complementary strand runs from 3’ to 5’. 3’ or 5’ refers to the number of the carbon atom of the deoxyribose sugar that forms a phosphodiester bond with the next sugar, as shown below.

Folding of DNA to form a :

The double helix of DNA is wrapped around 8 histone proteins in 2 loops. The length of the double loop is 146 . This structure is known as a nucleosome. The DNA strand between adjacent

3 nucleosomes is about 60 basepairs long. The nucleosomes in turn wind into a cable 3 times thicker than the individual nucleosome. Each turn of the cable has about 6 nucleosomes. This cable is known as a chromatin fiber and this is resolvable by EM. During the metaphase stage of cell division the chromosomes become condensed. The chromatin fibers are looped around a central scaffold. Each loop is about 75,000 bp (75kb) long. These are then coiled to form the chromatids.

NB: The significance of the nucleosome is that when a cell undergoes apoptosis (programmed cell death) the inter nucleosome DNA is cut. This results in cut sections of DNA each of which is a multiple of 180bp. This can be demonstrated on an agarose gel where a series of bands at 180bp, 360bp, 540bp…… can be seen, and indicates that apoptosis has occurred. Apoptosis plays a crucial role in the development of many structures including the retina. For example, developing retinal neurons that do not develop neuronal connections will be disposed of by apoptosis. Apoptosis also plays a role in death of cells during disease processes. Examples of this are death by apoptosis of photoreceptors of animals suffering from hereditary retinal dystrophies (e.g. PRA), and of ganglion cells in glaucoma. Several pathways can feed into the end result of apoptosis of the cell. There are currently investigations of experimental treatments to inhibit apoptosis as a way of saving photoreceptors in animals suffering from hereditary retinal dystrophies.

CHROMOSOMES The DNA of the genome is divided into portions – the chromosomes. The nuclei of non-gametes have a diploid (2n) number of chromosomes. This means that there are 2 copies of each chromosome. The gametes (ova or sperm) are haploid (n) and have one copy of each chromosome. A normal diploid genome consists of two sex chromosomes and a number of autosomes (non-sex chromosomes). The number of autosomes varies between species.

Chromosome number of different species: Species Diploid number Human 46 Dog 78 Cat 38 Horse 64 Donkey 62 Sheep 54 Cow 60 Pig 38 Rabbit 44 Mouse 40 The ends of the chromosomes are known as telomeres. These have repeated DNA elements and act to protect the chromosome from degradation during DNA replications.

GENES Genes are lengths of DNA that code for individual proteins. The coding portion of genes only makes up a small percentage of the total DNA of the genome (c. 5%). The coding portions of the genes are split into segments called exons separated by noncoding intervening sequences called introns. The coding of genes is discussed under translation below.

NON-CODING DNA The non-coding DNA serves many functions some of which are only relatively recently recognized. In addition to non-coding RNA such as transfer-RNA and ribosomal-RNA, microRNAs and long non-coding RNA has been identified. Micro- have a role in controlling translational activity of some genes. The functions of long non-coding RNAs (of which there are thousands) are not fully understood but some have been shown to have regulatory roles in gene and mechanisms such as imprinting and X-chromosome inactivation. Having previously been described as “junk DNA” the functions of these non-coding regions and role in disease processes are becoming better understood.

4 TRANSCRIPTION & TRANSLATION The copying of DNA of a gene into RNA is called transcription. The messenger RNA transports the genetic message out of the nucleus to the cytoplasm where proteins are made by the translation of the . The mRNA codes for a specific order of specific amino acids. From the start site of translation the code is read in three basepairs units (a codon), one per amino acid. This coding can be likened to a series of three letter words. The following shows the effect if one letter (basepair) is missing, this is known as a deletion and results in a reading frame shift (the message is incorrect following the site of the deletion):

Wild type: THE ONE RED EYE WAS OUT TOO FAR 1 basepair deletion: THE ONR EDE YEW ASO UTT OOF

One amino acid may be coded for by up to 4 different triplets of basepairs (codons). The first two basepairs of the codon coding for any given amino acid are the same, the last basepair is the variable one (e.g. Alanine is coded for by: GCT, GCC, GCA or GCG). Three different triplets code for the end of the amino acid chain (stop codons TGA, TAG or TAA).

RNA is in many ways similar to DNA except: • It is single stranded • The backbone sugar is ribose not deoxyribose • RNA uses the pyrimidine base where DNA uses thymine

Transcription is a complex process under precise control with a number of proteins controlling the rate of transcription. These proteins interact with DNA sequences of the gene promoter. The entire system ensures that the appropriate protein is produced in the correct cell type and at the correct stage of life. The initiation of translation is when a TATA binding protein binds to a TATA sequence (TATA box) in the promoter. Other factors bind and interact with enhancer and silencer regions of the promoter. Finally RNA polymerase joins and transcription can start. The DNA is unwound and a complementary RNA copy is made of the coding strand of the DNA (the non-coding, complementary strand is not copied). The following shows a complementary strand of RNA:

GGTTACCATTAACGAT - DNA CCAAUGGUAAUUGCUA - mRNA

As mRNA is being formed the noncoding interruptions of the message (the introns) are spliced out. Introns start and end with a characteristic pair of bases (GT at the beginning and AG at the end). The intron/exon boundaries are known as the splice sites. A mutation at a splice site could mean that the intron is not removed from the mRNA or that an exon is cut out of the mRNA (either would completely alter the message). The primary RNA transcript undergoes splicing, a series of reactions where the intronic RNA segments are removed and the exonic RNA segments joined up (spliced). At the 3’ end of the gene (after the end of the ) is a polyadenylation signal. This results in the termination of transcription and the addition of several adenylate residues. This is called the poly(A) tail. Some genes may have more than one polyadenylation site used. This results in transcripts with different length 3’ untranslated regions (this difference can be shown on a Northern blot – RNA from a tissue separated by electrophoresis and transferred onto a membrane - probed for the gene of interest).

Translation of the mRNA occurs at the ribosome. The mRNA binds at the initiation site (AUG) to part of the ribosome. Transfer RNA bearing the amino acid methionine binds to the initiation codon and the polypeptide is extended from that point until a “stop” codon is reached.

5 CELL DIVISION - MITOSIS Mitosis is the process of cell division. During mitosis DNA is copied to make two copies, one for each of the 2 daughter cells.

Stages of cell cycle:

• Interphase (not dividing) • Mitosis (M phase) (dividing)

Interphase is divided into G1 phase, S phase and G2 phase.

Metaphase is divided into prophase, prometaphase, metaphase, anaphase & telophase.

6 Replication of DNA During DNA replication the Double stranded DNA molecule unwinds and each single strand is a template for synthesis of a new, complementary strand.

DNA replication is described as being semi-conservative – both of the two resulting double stranded have one strand derived from the original DNA.

Replication starts at specific sites termed origins of replication. Helicases unwind and hold apart the DNA. RNA polymerase then attracts complementary RNA nucleotides to build a short piece of RNA called an RNA primer. DNA polymerase copies the DNA from the end of the RNA primer by adding on the A, C, G or T nucleotide to make a complementary strand to the parent template. DNA polymerase proof reads the new strand removing and replacing mistakes that it may have made. Finally the RNA primer is replaced by DNA. DNA polymerases work in one direction (5’ to 3’) which is fine for one of the strands (the leading strand) but creates difficulties for the other strand (the lagging strand). To overcome this directionality problem the polymerases make short sections of DNA that are then joined together by enzymes called ligases. These short fragments are called Okazaki fragments.

MEIOSIS is the process of cell division by which the gametes (sperm or ovum) are formed. During meiosis there is exchange of material between paired chromosomes. This results in shuffling of the genetic material that was inherited from each parent (each gamete will contain material from each parent of the parent to be).

7 Meiosis is divided into meiosis I during which there is a division that reduces the copies of chromosomes from 2n (i.e. a dipoid number) to n (i.e. a haploid number) and meiosis II which is similar to mitosis in that the two chromatids of a chromosome split and end up in daughter cells. The stages of meiosis I are shown in the diagram below. Note that initially there is DNA replication (interphase) so that the cells go from 2n, 2c (2 of each chromosome – one parentally derived and the other maternally - and 2 chromatids for each chromosome) to 2n, 4c (2 of each chromosome and 4 chromatids). Then there are the stages of meiosis I resulting in 2 cells that are 1n, 2c. These cells then divide in meiosis II to produce 2 cells that are 1n, 1c. In males 4 gametes are produced from 1 cell. However, in females meiosis of a cell produces only 1 gamete. Meiosis starts in the female embryo and then becomes arrested. Then during the first meiotic division occurs. One of the resulting cells is very small and remains as a polar body and stays attached to the larger cell which is called a secondary oocyte. If the oocyte is fertilized meiosis II occurs forming the 1n, 1c egg pronucleus that fuses with the sperm pronucleus and a second polar body.

Note that for each chromosome pair cross over only occurs between one chromatid from each chromosome. The average number of chiasmata (crossover points) during human male meiosis is 50 (approx. 2.36 chiasmata/bivalent).

X-INACTIVATION Female mammals have two X chromosomes while males have only one X-chromosome. If both X chromosomes in females were active this could result in a doubling of X-chromosome gene product in females. To prevent this problem in each cell one X-chromosome is inactive. The inactive X-chromosome can be seen in interphase cells as a darkly staining chromosome, the Barr body. Mary Lyon in 1961 proposed that the Barr body was the inactive X-chromosome and that the turning off of the chromosome occurred early in embryonic development. The X-chromosome that is turned off maybe either the maternally derived or paternally derived chromosome. After X inactivation all cells that arise from that cell will have the same X chromosome turned off. The commonly quoted example is the tortoiseshell or calico cat. An X-linked gene confers black, brown or orange coat color. A female who is heterozygous for this gene will have patches of orange and either brown or black. The earlier the X-inactivation occurs the larger the patches of color. Under what circumstances could a male cat be tortoiseshell or calico?

8 An ophthalmological example is a carrier female dog for X-linked PRA. The parts of the retina that are derived from cells in which the X chromosome containing the mutated gene was inactivated will be normal, while those derived from a cell lineage in which the normal X chromosome was inactivated will degenerate.

MENDELIAN INHERITANCE Mendel’s laws relates to genes that are on separate chromosomes. The simplest genetic characters depend on the genotype at a single locus. Mode of inheritance for these include autosomal recessive, autosomal dominant, X-linked recessive and X-linked dominant. There are not any recognized Y-linked diseases.

What we learned from Mendel: 1. There is a pair of genes for each trait (i.e. the genotype for a given trait is specified by a pair of genes). (N.B. Some loci can have more than one allele.) 2. During formation of gametes, the gene pair for a trait segregates equally. The genes in a pair are distributed to the gametes so that each gamete receives only one member of the pair; any gamete has an equal chance of receiving either member of the pair. This is Mendel’s law of equal segregation. (Mendel’s first principle) 3. A gene has two forms, called alleles, designated by A and a. Only plants with the genotype aa (homozygous for a) exhibit a recessive phenotype. Plants with the genotype AA (homozygous for A) or Aa (heterozygous) exhibit a dominant phenotype. 4. During the formation of gametes, segregation of the gene pair for any one trait is independent of the segregation of the other gene pairs. As a result, a plant heterozygous for two traits (AaBb) produces gametes containing AB, Ab, aB, and ab with equal probability. This is the law of independent assortment. (Mendel’s second principle). Note that the law of independent assortment holds only if the two traits under consideration are on different pairs of homologous chromosomes, or are distant enough from each other on the same chromosome that they are not “linked”.

Punnett squares can be used in following the transmission of traits:

Female gametes A a

A AA Aa Male gametes a Aa aa

Female gametes A A

A AA AA Male gametes a Aa Aa

Apparent alteration to Mendelian ratios Mendel’s laws apply to the underlying genotype which is not always reflected in the phenotype: • Incomplete dominance – different alleles that are expressed in the heterozygote are co-dominant • Epistasis – the masking of the effect of one gene by another gene • Incomplete penetrance - some individuals do not express the phenotype • Pleiotrophy – a Mendelian disorder resulting in different manifestations • Phenocopy – a noninherited environmentally caused trait that appears to be inherited. • Genetic heterogeneity - a characteristic that can be caused by different genes (e.g. PRA)

9 Lethal Allele Combinations Some allele combinations can cause embryonic death and thus apparently alter Mendelian ratios. An example is the Mexican hairless dog. Hairlessness is a dominant trait which in the homozygous state causes embryonic death and stillbirths.

NON-MENDELIAN INHERITANCE Non-mendelian inheritance includes mitochondrial inherited defects (e.g. Leber Hereditary Optic Neuropathy) and imprinting effects. • Mitochondrial DNA is derived purely from the ovum (sperm do not contribute) therefore it is described as being maternally inherited. Defects due to mutations in mitochondrial DNA are likewise maternally inherited. • Definition of genomic imprinting is that it is an epigenetic modificaiton of a specific parental allele of a gene, or the chromosome on which it resides, in the gamete or zygote, leading to differential expression of the two alleles of the gene in somatic cells of the offspring. (i.e. there is uniparental expression).

MUTATIONS When mutations of genes occur in the germline they can be transmitted to future generations. When they occur in somatic cells there is the possibility of disease in the individual involved. DNA repair enzymes correct most mutations that occur. Alteration of nucleotides may have no effect if they fall in a non-coding region that has no regulatory roles, if they change a regulatory region they may alter gene expression and if they involve a protein coding region they may alter the gene product and thus its function. In some cases this may be disease producing and in others it may confer an advantage (or sometimes both e.g. mutations causing sickle cell anemia). DNA mutations may involve single base pairs or may involve insertions or deletions of DNA and in some cases alterations in the number of repeats of a repetitive element. The latter case is responsible for some diseases resulting from expansion of the number of triplet repeats. Such diseases (e.g. myotonic dystrophy and Huntington disease) show anticipation (i.e. the phenotype worsens in successive generations) because of increasing numbers of repeats in successive generations.

Point mutations (single mutations) may be transitions (e.g. purine to purine – A to G or G to A; or pyrimidine to pyrimidine – C to T or T to C), or transversions (e.g. purine to pyrimdine or vice versa). Differences in DNA are also very useful, for example mapping of genes is facilitated by variability in DNA sequence. Single nucleotide polymorphisms (SNPs) are very commonly used in mapping now because they occur every 500 or so base pairs.

CHROMOSOMAL DISORDERS Deletions of portions of chromosomes, translocations of parts of one chromosome to another chromosome and deletions or repetitions of chromosomes are associated with disease (the commonest example is trisomy 21 or Down syndrome).

IDENTIFYING THE CAUSE OF GENETIC DISEASE Identifying the DNA mutation responsible for a particular disease has been likened to looking for a needle in a haystack where the needle looks the same as a piece of hay (it may be 1 basepair altered out of ~2.4 billion basepairs (for a dog)). The disease phenotype can help considerably. For some genetic diseases a known biochemical defect may have been identified making genes encoding proteins involved in that biochemical defect likely candidates for being the mutated gene. For example with metabolic diseases identification that a particular protein was absent makes the gene encoding that protein a good candidate for investigation.

Candidate gene approach With this approach information about the disease phenotype is considered and the question asked what gene if defective could cause this disease. In some instances there are only a small number of genes that could cause the phenotype making this approach feasible. Some conditions, such as PRA, can result from mutations in large numbers of genes making a candidate gene approach more challenging. However, the candidate gene approach is still utilized. For example the identification of the PDE6A gene mutation that causes rcd3 form of PRA in Cardigan Welsh Corgis and the HSF4 gene mutation responsible for cataract in some breeds were both

10 examples of screening of candidate genes. The candidate genes can be screened by sequencing of the entire gene or by using genetic markers very close to, or within the gene, to look for association between the marker and the gene.

IDENTITY BY DESCENT Association mapping and association screening of candidate gene loci relies on identity by descent. For a single genetic disease the causal mutation occurred in an animal many generations previously. All the animals with that disease have inherited the mutation from the founder animal. The chromosome surrounding the mutation is also inherited from the founder animal and is identical between all the descendent affected animals (as indicated in the diagram below). Following meiosis over many generations the region of identity by descent narrows (indicated as “disease interval” in the diagram below).

autozygosity.org

GENOME-WIDE MAPPING As genetic markers became available it was possible to map the genetic position of the disease by its linkage or association to a genetic marker. Initially physical markers (e.g. blood groups) were used and then minisatellites and restriction fragment polymorphisms were used, followed by microsatellites and now single nucleotide polymorphisms.

Genetic markers: • Minisatellites (VNTRs – variable number of tandem repeats) = repeats of stretches of DNA, the number of repeats is variable and inherited in a Mendelian fashion • Restriction fragment polymorphisms (RFLP) = a change in DNA sequence that adds or removes a site for a restriction enzyme to cut the DNA. Genomic DNA was digested with a restriction enzyme and then a DNA probe applied. The size of fragments that the probe lit up varied between individuals if they had different restriction enzyme cutting sites in the DNA that was probed. SINEs (short interspersed elements) = these are retrotransposon mobile elements that propogate in the host genome by a “copy and paste” mechanism. There are insertions across the genome. They can be used for mapping and are biallelic - for each particular SINE they may be either present or absent.ink • Microsatellites = repeats of 2, 3, 4, 5 or more nucleotides (commonest binucleotide repeat are the CA repeats). The number of repeats is variable and inherited in a Mendelian fashion.

11 Microsatellites sets evenly distributed across genomes have been identified and used for disease mapping. • Single nucleotide polymorphisms (SNPs) are explained above. They are now widely used for mapping. SNP microarrays are available for several species (including dog and cat) and are widely used for disease mapping. The Illumina high density canine SNP array has over 170,000 polymorphic SNPs spread across the genome. Individual animals are genotyped at each of the 170,000+ SNP loci. SNPs are biallelic (there are two versions in a population).

LINKAGE – DISEASE MAPPING Linked genes, or genetic markers, do not assort independently (i.e. they do not follow Mendelian ratios), this is because they are on the same chromosome. The closer they are the less likely they are to become separated by a cross over during meiosis. When a crossover between linked genes does occur it is described as a recombination. Because there are two copies of each autosome the distinction must be made between alleles that are on one copy of the chromosome (i.e. either the maternally derived or paternally derived chromosome) and alleles which are on different copies of the chromosome. A group of alleles linked together on the same copy of the chromosome is described as a haplotype. Following the inheritance of linked markers through a pedigree can enable one to determine the haplotype and therefore which chromosome was inherited from which parent.

12

In this example we are considering two loci (locus A, alleles A1 and A2; locus B alleles B1 and B2). Where this can be deduced, the shaded boxes show which combination of alleles has been received from the father. It is possible to see which individuals in generation III that were conceived from recombinant sperm (marked R; nonrecombinants marked NR). It is not possible to identify any recombinations from the female in generation II.

Investigation of the crossover frequency between linked markers enables the construction of linkage maps. These show the order of markers (or genes) on chromosomes and the relative distances between them. Tight linkage between two alleles is called linkage disequilibrium.

The lod score (logarithm of the odds), Z, is the logarithm of the odds that two loci are linked (with recombination fraction θ) rather than unlinked (recombination fraction 0.5). A lod score of 3 or above is taken as evidence of linkage (a lod score of 3 is equivalent to odds of 1,000:1). The recombination fraction, θ, is the proportion of recombinations out of the total of recombinations and nonrecombinations. As only two out of four gametes are affected by a crossover, the recombination fraction can be at most 50%. When loci are close together and a recombination between them is unlikely θ tends to zero. The genetic distance, x, is defined as the average number of crossovers points between two loci on a gamete. It is measured in Morgans (M) or centi morgans (cM). For small distances x = θ. For example θ = 0.02 corresponds to 0.02 Morgans (2cM).

ASSOCIATION MAPPING Association mapping is a technique that is often used to map a genetic disease using the SNP microarrays. Following genotyping of a series of affected and control animals a statistical analysis is applied to see if at any of the SNP loci one version of the SNP is significantly more frequently present in the affected population compared to the controls. The results are often displayed graphically in what is commonly referred to as a Manhattan plot. The x-axis displays the chromosomal location of the SNP and the y-axis how highly associated it is with the disease phenotype. The resulting plot looks like a city skyline with the higher “skyscrapers” indicating the more highly associated SNPs. When analyzing this you hope to see a run of SNPs on a chromosomal region that are highly associated with the disease state. This run of SNPs is anticipated to flank the disease locus.HOMOZYGOZITY MAPPING Another approach to the analysis of SNP microarray data suitable for recessive conditions is homozygozity mapping. With this technique the SNP phenotype data is analyzed to identify regions spanning several SNPs in a row where the affected animals are all homozygous for the same version of each SNP and yet

13 the unaffected animals have variation in the version of the SNPs. It is anticipated that because of identity by descent that the affected animals will have a run of SNPs flanking the disease locus over which they share the same version.

GENOME PROJECTS The genomes of several species have been sequenced and are publically available. A useful web site for looking at the genomes available is the University of California, Santa Cruz genome browser: http://genome.ucsc.edu/

WHOLE GENOME SEQUENCING Whole genome sequencing (WGS) is performed using a technique called Next Generation (NexGen) sequencing. This technique is quite different from the “Sanger” sequencing that is used to sequencing small stretches of DNA. NexGen sequencing (NGS) involves cutting the DNA into short segments each of which is sequenced on a high-throughput sequencing machine providing millions of reads of DNA. The individual short sequence strands are then placed onto the reference genome sequence for the species undergoing NGS. Multiple fragments are aligned to the reference sequence and the average number of fragments that align to the genome give the “depth of coverage”. The more fragments that are sequenced and aligned, the greater the coverage. The sequence provided for the individual animal being sequenced is only as a good as the reference sequence for the species (the backbone on which the sequenced fragments of DNA are assembled). Software programs can then be used to identify DNA differences in the known genes between the reference sequence and the genome of the animal sequenced. There will be a large number of differences and identifying which of these are disease causing can be difficult. Often WGS is used after a disease locus has been mapped and the DNA changes within that mapped region can be investigated.

HOW DO WE KNOW THAT A DNA CHANGE CAUSES A SPECIFIC DISEASE? As there can be many different DNA variations that are within a mapped disease locus it is important to be sure that we have identified the actual disease causing mutation rather than a linked DNA variation. Some DNA changes very obviously alter the gene coding and are expected to result in a lack of gene product – these situations are relatively straightforward. If the DNA alteration just changes an amino acid in the final protein it may be more difficult to know whether that results in disease or not.

An example of the difficulty in ascertaining if a given DNA variant is disease causing is an insertion in the RPGRIP1 gene that was initially described as the cause of a cone-rod dystrophy form of PRA in the miniature longhaired dachshund and subsequently found to be present in several other breeds (for example the English Springer Spaniel where it occurs at a higher frequency than PRA; thus bringing into doubt the significance of the insertion in this breed). The RPGRIP1 gene has multiple splice variants that are expressed in the retina and not all of these include the part of the gene with the insertion. It is now recognized that some dogs with this insertion have do not develop PRA. Current work suggests that there may be a second genetic locus on the same chromosome, a variation at which is necessary for photoreceptor degeneration to occur. This highlights the potential complexity of some genetic diseases.

Ideally phenotyping supporting data for a given presumptive disease causing mutation should be provided. This could include loss of gene product prior to loss of the cells in which it is expressed (shown on immunohistochemistry or Western blotting), loss of a biochemical pathway in which the gene product is important, loss of function of specific cells, accumulation of the gene product in an abnormal site within a cell (resulting in a deleterious effect on the cell leading to cell death) etc.

14 GENETIC TESTS

Genetic tests can be divided into those that identify the causal gene mutation directly and those that use a closely linked marker (marker or linkage test).

Direct identification of mutation Once a mutation has been identified as causing a particular disease it is a relatively simple task to design a test to identify the presence/absence of the mutation. Most tests are PCR-based. Examples include: • DNA sequencing • Straight electrophoretic analysis of PCR product if the mutation is an insertion or deletion of sufficient size (e.g. RPE65 gene mutation that causes the so called “stationary night-blindness” in briards is a 4bp deletion that can be detected on a high percentage agarose, or a polyacrylamide gel) • If the mutation introduces, removes or alters a recognition site for a restriction enzyme (the PCR product is digested with the restriction enzyme and run on an agarose gel). • If no naturally occurring alteration in restriction enzyme cutting site is introduced a PCR with primers that are altered to mean that the normal or mutant product have a restriction site can be used. This is called a mismatch PCR which is followed by restriction enzyme digest and electrophoretic analysis e.g. the first test for PRA which was developed for the rcd1 form of PRA in the Irish setter.

15 Mismatch PCR Test for rcd1-PRA in Irish Setters Mutant allele Normal allele

PCR

Bfa1 Restriction Bsr1 Bfa1 enzyme digestBsr1

M P 1a 1b 2a 2b 3a 3b 4a 4b 5a 5b 6a 6b

Key:

M = DNA size markers 4, 5 & 6 = test animals. What are the P = uncut PCR product results????? a = PCR digest with Bfa1 (cuts mutant) b = PCR digested with Bsr1 (cuts normal) 1 = affected control 2 = carrier control 3 = unaffected control

16

• Allele-specific PCR. In this test the PCR primers are designed in two sets, one will only amplify the mutant allele and the other will only amplify the normal allele. An example of this is a test that we developed to identify the presence/absence of a 1 bp deletion in the alpha subunit of cGMP phosphodiesterase (PDE6A) that causes PRA in Cardigan Welsh corgis.

Allele-specific PCR test

Amplify in PCR that is designed to normal affected carrier amplify only the normal + - + TEST gene DNA Amplify in PCR that is designed to - + + amplify only the mutant gene

PCR Diagnostic Test for PRA in Cardigan Welsh Corgis

1a 1b 2a 2b 3a 3b 4a 4b 5a 5b 6a 6b M a = PCR for normal

b = PCR for mutant

7a 7b 8a 8b Na Nb Ca Cb Aa Ab - ve a - ve b M

M = DNA size markers; N = normal control; C = carrier control; A = affected control. What are the results for dogs 1 – 8?

17

Linkage-based test The initial test for the prcd form of PRA offered was based on a linked DNA marker. However once the causal mutation was identified a mutation detection test was developed. The linked marker test was capable of accurately identifying homozygous unaffected dogs. The PRA mutation only occurred with (was linked to) a particular version of the marker, however this version of the marker also occurred associated with the normal allele. Therefore dogs that were homozygous for the disease associated form of the marker included all the prcd-affected dogs but also a percentage of the normal dogs. The same is true for those dogs that are heterozygous for the marker; this group will contain all prcd-carriers, but also some normal dogs.

Illustrations have been taken from the following:

Lewis R. Human Genetics. Concepts and Applications. McGraw-Hill (this book is very easy to follow and well illustrated)

Strachan T & Read AP Human Molecular Genetics. Garland Science

Useful Web pages:

Online mendelian inheritance in man (OMIM) - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMIM

Online mendelian inheritance in animals: http://omia.angis.org.au/home/

Inherited Diseases in Dogs: http://www.vet.cam.ac.uk/idid/

Human Genome Project Information: http://www.genome.gov/10001772

Dog Genome Project: https://www.broadinstitute.org/mammals/dog and http://research.nhgri.nih.gov/dog_genome

RetNet - genes causing retinal disease: https://sph.uth.edu/retnet/

18

Hereditary Eye Disease Genes

Disclaimer: The field is moving very quickly and this list will be out of date very quickly. A number of different laboratories offer genetic testing. Check with each laboratory for a current list of tests and how to submit samples for testing. Breed Disease Gene

Abyssinian, Somali, Ocicat, Siamese and arPRA - RdAc CEP290 related breeds

Abyssinian, Somali cat breeds adPRA - Rdy CRX Cone degeneration Alaskan Malamute CNGB3 (achromatopsia) American Cocker Spaniel arPRA - prcd PRCD American Eskimo Dog arPRA - prcd PRCD American Pit Bull crd2 ? Australian Cattle Dog arPRA - prcd PRCD Australian Cattle Dog PLL ADAMTS17 Intronic mutation in Australian Shepherd CEA-CH NEHJ1 Cone degeneration Australian Shepherd CNGB3 (achromatopsia) Australian Shepherd cmr-1 BEST1 Australian Shepherd arPRA- prcd PRCD Heat shock factor 4 Australian Shepherd Hereditary cataract (HSF4) Australian Stumpy Tail Cattle Dog arPRA- prcd PRCD

Beagle POAG ADAMTS10

Intronic mutation in Border Collie CEA-CH NEHJ1 hereditary cataract (early-onset Boston Terrier HSF4 form only) Intronic mutation in Boykin Spaniel CEA-CH NEHJ1 Briard Retinal dystrophy RPE65 Bull Mastiff adPRA rhodopsin Bull Mastiff cmr-1 bestrophin 1 (BEST1) Cane Corso cmr-1 BEST1 Cardigan Welsh Corgi arPRA - rcd3 PDE6A Cavalier King Charles Spaniel curly coat dry eye/CKCSID FAM83H Chesapeake Bay Retriever arPRA - prcd PRCD Chinese Crested arPRA - prcd PRCD

19

Chinese Crested PLL ADAMTS17 Coton du Tulear cmr-2 BEST1 Dogue de Bordeaux cmr-1 BEST1 (English) Cocker Spaniel arPRA - prcd PRCD English and Bull Mastiff cmr-1 BEST1 English Springer Spaniel PRA- cord1 RPGRIP1 Entlebucher Sennenhund arPRA - prcd PRCD Finnish Lapphund arPRA - prcd PRCD French Bulldog Hereditary cataract HSF4 Cone degeneration German Shorthaired Pointer CNGB3 (achromatopsia) Glen of Imaal Terrier crd3 ADAM9 Golden Retriever arPRA - prcd PRCD Golden Retriever arPRA (GR_PRA1) SLC4A3 Golden Retriever arPRA (GR_PRA2) Not yet published Late-onset PRA (LOPRA) - Gordon Setter C2ORF17 rcd4

Great Pyrenees (Pyrenean Mountain Dog) cmr-1 BEST1

Irish Red and White Setter arPRA – rcd1 PDE6B Irish Setter arPRA – rcd1 PDE6B Late-onset PRA (LOPRA) - Irish Setter C2ORF17 rcd4 Jack Russell Terrier PLL ADAMTS17 Jagdterrier PLL ADAMTS17 Karelian Bear Dog arPRA - prcd PRCD Kuvasz arPRA - prcd PRCD Labrador Retriever arPRA - prcd PRCD Labrador Retriever OSD - drd1 COL9A2 Intronic mutation in Lancashire Heeler CEA-CH NEHJ1 Lancashire Heeler PLL ADAMTS17 Lapponian Herder cmr-3 BEST1 Lapponian Herder arPRA - prcd PRCD Intronic mutation in Longhaired Whippet CEA-CH NEHJ1 Markiesje arPRA - prcd PRCD Miniature Bull Terrier PLL ADAMTS17 Miniature Long-haired, Smooth-haired and PRA - cord1 RPGRIP1 Wire-haired Dachshund

20

Miniature Wire-haired Dachshund CRD NPHP4 Miniature and Toy Poodle arPRA - prcd PRCD Miniature Schnauzer arPRA – type A Not yet published Norwegian Elkhound arPRA - prcd PRCD Norwegian Elkhound arPRA- erd STK38L Intronic mutation in Nova Scotia Duck Tolling Retriever CEA-CH NEHJ1 Nova Scotia Duck Tolling Retriever arPRA - prcd PRCD Old English Mastiff adPRA rhodopsin Old English Mastiff cmr-1 BEST1 Papillon arPRA – PRAtype1 CNGB1 Parson Russell Terrier PLL ADAMTS17 Patterdale Terrier PLL ADAMTS17 Perro de Presna Canario cmr-1 BEST1 Portuguese Water Dog arPRA - prcd PRCD Rat Terrier PLL ADAMTS17 Rough & Smooth Collie rcd2 c1orf36 Intronic mutation in Rough & Smooth Collie CEA-CH NEHJ1 Samoyed xlPRA RPGR Samoyed OSD - drd2 COL9A3 Schapendoes arPRA CCDC66 Sealyham Terrier PLL ADAMTS17 Intronic mutation in Shetland Sheepdog CEA-CH NEHJ1 Siberian Husky xlPRA RPGR Intronic mutation in Silken Windhound CEA-CH NEHJ1 Silky Terrier arPRA - prcd PRCD Sloughi arPRA – rcd1a PDE6B Spanish Water Dog arPRA - prcd PRCD Staffordshire Bull Terrier Hereditary cataract HSF4 Standard Wire-Haired Dachshund crd NPHP4 Swedish Lapphund arPRA - prcd PRCD Tenterfield Terrier PLL ADAMTS17 Tibetan Terrier PLL ADAMTS17 Toy Fox Terrier PLL ADAMTS17 Volpino Italiano PLL ADAMTS17

21

Welsh Terrier PLL ADAMTS17 Wire-haired Fox Terrier PLL ADAMTS17 Yorkshire Terrier PLL ADAMTS17 Yorkshire Terrier arPRA - prcd PRCD

Key: adPRA: autosomal dominant progressive retinal atrophy arPRA: autosomal recessive progressive retinal atrophy CEA-CH: Collie eye anomaly – choroidal hypoplasia cmr: canine multifocal retinopathy cord: cone-rod dystrophy Crd: cone-rod dystrophy Drd: dwarfism with retinal dysplasia Erd: early retinal degeneration OS: ocular skeletal dysplasia PLL: primary lens luxation POAG: primary open-angle glaucoma PRCD: progressive rod cone dysplasia RCD: rod cone dysplasia RdAC: Retinal degeneration in the Abyssinian cat Rdy: rod-cone dysplasia xlPRA: x-linked progressive retinal atrophy

22

Ophthalmic Disease Genes in Dogs, Cats and Horses

23

Glossary (from DOE Human Genome Program Primer on Molecular Genetics. http://www.ornl.gov/hgmis/publicat/primer/intro.html)

Adenine (A): A nitrogenous base, one member of the base pair A-T (adenine- thymine). Alleles: Alternative forms of a genetic locus; a single allele for each locus is inherited separately from each parent (e.g., at a locus for eye color the allele might result in blue or brown eyes). Amino acid: Any of a class of 20 molecules that are combined to form proteins in living things. The sequence of amino acids in a protein and hence protein function are determined by the genetic code. Amplification: An increase in the number of copies of a specific DNA fragment; can be in vivo or in vitro. See cloning, polymerase chain reaction. Arrayed library: Individual primary recombinant clones (hosted in phage, cosmid, YAC, or other vector) that are placed in two-dimensional arrays in microtiter dishes. Each primary clone can be identified by the identity of the plate and the clone location (row and column) on that plate. Arrayed libraries of clones can be used for many applications, including screening for a specific gene or genomic region of interest as well as for physical mapping. Information gathered on individual clones from various and physical map analyses is entered into a relational database and used to construct physical and genetic linkage maps simultaneously; clone identifiers serve to interrelate the multi-level maps. Compare library, genomic library. Autoradiography: A technique that uses X-ray film to visualize radioactively labeled molecules or fragments of molecules; used in analyzing length and number of DNA fragments after they are separated by gel electrophoresis. Autosome: A chromosome not involved in sex determination. The diploid human genome consists of 46 chromosomes, 22 pairs of autosomes, and 1 pair of sex chromosomes (the X and Y chromosomes). Bacteriophage: See phage. Base pair (bp): Two nitrogenous bases (adenine and thymine or guanine and cytosine) held together by weak bonds. Two strands of DNA are held together in the shape of a double helix by the bonds between base pairs. Base sequence: The order of nucleotide bases in a DNA molecule. Base sequence analysis: A method, sometimes automated, for determining the base sequence. Biotechnology: A set of biological techniques developed through basic research and now applied to research and product development. In particular, the use by industry of recombinant DNA, cell fusion, and new bioprocessing techniques. bp: See base pair. cDNA: See complementary DNA. Centimorgan (cM): A unit of measure of recombination frequency. One centimorgan is equal to a 1% chance that a marker at one genetic locus will be separated from a marker at a second locus due to crossing over in a single generation. In human beings, 1 centimorgan is equivalent, on average, to 1 million base pairs. Centromere: A specialized chromosome region to which spindle fibers attach during cell division. Chromosomes: The self-replicating genetic structures of cells containing the cellular DNA that bears in its nucleotide sequence the linear array of genes. In prokaryotes, chromosomal DNA is circular, and the entire genome is carried on one chromosome. Eukaryotic genomes consist of a number of chromosomes whose DNA is associated with different kinds of proteins. Clone bank: See genomic library. Clones: A group of cells derived from a single ancestor. Cloning: The process of asexually producing a group of cells (clones), all genetically identical, from a single ancestor. In recombinant DNA technology, the use of DNA manipulation procedures to produce multiple copies of a single gene or segment of DNA is referred to as cloning DNA. Cloning vector: DNA molecule originating from a virus, a plasmid, or the cell of a higher organism into which another DNA fragment of appropriate size can be integrated without loss of the vector’s capacity for self-replication; vectors introduce foreign DNA into host cells, where it can be reproduced in large quantities. Examples are plasmids, cosmids, and yeast artificial chromosomes; vectors are often recombinant molecules containing DNA sequences from several sources. cM: See centimorgan. Code: See genetic code.

24

Codon: See genetic code. Complementary DNA (cDNA): DNA that is synthesized from a messenger RNA tem-plate; the single- stranded form is often used as a probe in physical mapping. Complementary sequences: Nucleic acid base sequences that can form a double-stranded structure by matching base pairs; the complementary sequence to G-T-A-C is C-A-T-G. Conserved sequence: A base sequence in a DNA molecule (or an amino acid sequence in a protein) that has remained essentially unchanged throughout evolution. Contig map: A map depicting the relative order of a linked library of small overlapping clones representing a complete chromosomal segment. Contigs: Groups of clones representing overlapping regions of a genome. Cosmid: Artificially constructed cloning vector containing the cos gene of phage lambda. Cosmids can be packaged in lambda phage particles for infection into E. coli; this permits cloning of larger DNA fragments (up to 45 kb) than can be introduced into bacterial hosts in plasmid vectors. Crossing over: The breaking during meiosis of one maternal and one paternal chromosome, the exchange of corresponding sections of DNA, and the rejoining of the chromosomes. This process can result in an exchange of alleles between chromosomes. Compare recombination. Cytosine (C): A nitrogenous base, one member of the base pair G-C (guanine and cytosine). Deoxyribonucleotide: See nucleotide. Diploid: A full set of genetic material, consisting of paired chromosomes—one chromosome from each parental set. Most animal cells except the gametes have a diploid set of chromosomes. The diploid human genome has 46 chromosomes. Compare haploid. DNA (deoxyribonucleic acid): The molecule that encodes genetic information. DNA is a double-stranded molecule held together by weak bonds between base pairs of nucleotides. The four nucleotides in DNA contain the bases: adenine (A), guanine (G), cytosine (C), and thymine (T). In nature, base pairs form only between A and T and between G and C; thus the base sequence of each single strand can be deduced from that of its partner. DNA probes: See probe. DNA replication: The use of existing DNA as a template for the synthesis of new DNA strands. In and other eukaryotes, replication occurs in the cell nucleus. DNA sequence: The relative order of base pairs, whether in a fragment of DNA, a gene, a chromosome, or an entire genome. See base sequence analysis. Domain: A discrete portion of a protein with its own function. The combination of domains in a single protein determines its overall function. Double helix: The shape that two linear strands of DNA assume when bonded together. E. coli: Common bacterium that has been studied intensively by geneticists because of its small genome size, normal lack of pathogenicity, and ease of growth in the laboratory. Electrophoresis: A method of separating large molecules (such as DNA fragments or proteins) from a mixture of similar molecules. An electric current is passed through a medium containing the mixture, and each kind of molecule travels through the medium at a different rate, depending on its electrical charge and size. Separation is based on these differences. Agarose and acrylamide gels are the media commonly used for electrophoresis of proteins and nucleic acids. Endonuclease: An enzyme that cleaves its nucleic acid substrate at internal sites in the nucleotide sequence. Enzyme: A protein that acts as a catalyst, speeding the rate at which a biochemical reaction proceeds but not altering the direction or nature of the reaction. EST: Expressed sequence tag. See sequence tagged site. Eukaryote: Cell or organism with membrane-bound, structurally discrete nucleus and other well- developed subcellular compartments. Eukaryotes include all organisms except viruses, bacteria, and blue- green algae. Compare prokaryote. See chromosomes. Evolutionarily conserved: See conserved sequence. Exogenous DNA: DNA originating outside an organism. Exons: The protein-coding DNA sequences of a gene. Compare introns. Exonuclease: An enzyme that cleaves nucleotides sequentially from free ends of a linear nucleic acid substrate.

25

Expressed gene: See gene expression. FISH (fluorescence in situ hybridization): A physical mapping approach that uses fluorescein tags to detect hybridization of probes with metaphase chromosomes and with the less-condensed somatic interphase chromatin. Flow cytometry: Analysis of biological material by detection of the light-absorbing or fluorescing properties of cells or subcellular fractions (i.e., chromosomes) passing in a narrow stream through a laser beam. An absorbance or fluorescence profile of the sample is produced. Automated sorting devices, used to fractionate samples, sort successive droplets of the analyzed stream into different fractions depending on the fluorescence emitted by each droplet. Flow karyotyping: Use of flow cytometry to analyze and/or separate chromosomes on the basis of their DNA content. Gamete: Mature male or female reproductive cell (sperm or ovum) with a haploid set of chromosomes (23 for humans). Gene: The fundamental physical and functional unit of heredity. A gene is an ordered sequence of nucleotides located in a particular position on a particular chromosome that encodes a specific functional product (i.e., a protein or RNA molecule). See gene expression. Gene expression: The process by which a gene’s coded information is converted into the structures present and operating in the cell. Expressed genes include those that are transcribed into mRNA and then translated into protein and those that are transcribed into RNA but not translated into protein (e.g., transfer and ribosomal RNAs). Gene families: Groups of closely related genes that make similar products. Gene library: See genomic library. Gene mapping: Determination of the relative positions of genes on a DNA molecule ( chromosome or plasmid) and of the distance, in linkage units or physical units, between them. Gene product: The biochemical material, either RNA or protein, resulting from expression of a gene. The amount of gene product is used to measure how active a gene is; abnormal amounts can be correlated with disease-causing alleles. Genetic code: The sequence of nucleotides, coded in triplets ( codons) along the mRNA, that determines the sequence of amino acids in protein synthesis. The DNA sequence of a gene can be used to predict the mRNA sequence, and the genetic code can in turn be used to predict the amino acid sequence. Genetic engineering technologies: See recombinant DNA technologies. Genetic map: See linkage map. Genetic material: See genome. Genetics: The study of the patterns of inheritance of specific traits. Genome: All the genetic material in the chromosomes of a particular organism; its size is generally given as its total number of base pairs. Genome projects: Research and technology development efforts aimed at mapping and sequencing some or all of the genome of human beings and other organisms. Genomic library: A collection of clones made from a set of randomly generated overlapping DNA fragments representing the entire genome of an organism. Compare library, arrayed library. Guanine (G): A nitrogenous base, one member of the base pair G-C (guanine and cytosine). Haploid: A single set of chromosomes (half the full set of genetic material), present in the egg and sperm cells of animals and in the egg and pollen cells of plants. Human beings have 23 chromosomes in their reproductive cells. Compare diploid. Helicases: Motor proteins that translocate unidirectionally along single-stranded nucleic acids using energy derived from nucleotide hydrolysis, often separating the two strands of a nucleic acid double helix in the process. Heterozygosity: The presence of different alleles at one or more loci on homologous chromosomes. Homeobox: A short stretch of nucleotides whose base sequence is virtually identical in all the genes that contain it. It has been found in many organisms from fruit flies to human beings. In the fruit fly, a homeobox appears to determine when particular groups of genes are expressed during development. Homologies: Similarities in DNA or protein sequences between individuals of the same species or among different species. Homologous chromosomes: A pair of chromosomes containing the same linear gene sequences, each derived from one parent. Human gene therapy: Insertion of normal DNA directly into cells to correct a genetic defect.

26

Human Genome Initiative: Collective name for several projects begun in 1986 by DOE to (1) create an ordered set of DNA segments from known chromosomal locations, (2) develop new computational methods for analyzing genetic map and DNA sequence data, and (3) develop new techniques and instruments for detecting and analyzing DNA. This DOE initiative is now known as the Human Genome Program. The national effort, led by DOE and NIH, is known as the Human Genome Project. Hybridization: The process of joining two complementary strands of DNA or one each of DNA and RNA to form a double-stranded molecule. Informatics: The study of the application of computer and statistical techniques to the management of information. In genome projects, informatics includes the development of methods to search databases quickly, to analyze DNA sequence information, and to predict protein sequence and structure from DNA sequence data. In situ hybridization: Use of a DNA or RNA probe to detect the presence of the comple-mentary DNA sequence in cloned bacterial or cultured eukaryotic cells. Interphase: The period in the cell cycle when DNA is replicated in the nucleus; followed by mitosis. Introns: The DNA base sequences interrupting the protein-coding sequences of a gene; these sequences are transcribed into RNA but are cut out of the message before it is translated into protein. Compare exons. In vitro: Outside a living organism. Karyotype: A photomicrograph of an individual’s chromosomes arranged in a standard format showing the number, size, and shape of each chromosome type; used in low-resolution physical mapping to correlate gross chromosomal abnormalities with the characteristics of specific diseases. kb: See kilobase. Kilobase (kb): Unit of length for DNA fragments equal to 1000 nucleotides. Library: An unordered collection of clones (i.e., cloned DNA from a particular organism), whose relationship to each other can be established by physical mapping. Compare genomic library, arrayed library. Linkage: The proximity of two or more markers (e.g., genes, RFLP markers) on a chromosome; the closer together the markers are, the lower the probability that they will be separated during DNA repair or replication processes (binary fission in prokaryotes, mitosis or meiosis in eukaryotes), and hence the greater the probability that they will be inherited together. Linkage map: A map of the relative positions of genetic loci on a chromosome, determined on the basis of how often the loci are inherited together. Distance is measured in centimorgans (cM). Localize: Determination of the original position ( locus) of a gene or other marker on a chromosome. Locus (pl. loci): The position on a chromosome of a gene or other chromosome marker; also, the DNA at that position. The use of locus is sometimes restricted to mean regions of DNA that are expressed. See gene expression. Macrorestriction map: Map depicting the order of and distance between sites at which restriction enzymes cleave chromosomes. Mapping: See gene mapping, linkage map, physical map. Marker: An identifiable physical location on a chromosome (e.g., restriction enzyme cutting site, gene) whose inheritance can be monitored. Markers can be expressed regions of DNA (genes) or some segment of DNA with no known coding function but whose pattern of inheritance can be determined. See RFLP, restriction fragment length polymorphism. Mb: See megabase. Megabase (Mb): Unit of length for DNA fragments equal to 1 million nucleotides and roughly equal to 1 cM. Meiosis: The process of two consecutive cell divisions in the diploid progenitors of sex cells. Meiosis results in four rather than two daughter cells, each with a haploid set of chromosomes. Messenger RNA (mRNA): RNA that serves as a template for protein synthesis. See genetic code. Metaphase: A stage in mitosis or meiosis during which the chromosomes are aligned along the equatorial plane of the cell. Mitosis: The process of nuclear division in cells that produces daughter cells that are genetically identical to each other and to the parent cell. mRNA: See messenger RNA. Multifactorial or multigenic disorders: See polygenic disorders. Multiplexing: A sequencing approach that uses several pooled samples simultaneously, greatly increasing sequencing speed.

27

Mutation: Any heritable change in DNA sequence. Compare polymorphism. Nitrogenous base: A nitrogen-containing molecule having the chemical properties of a base. Nucleic acid: A large molecule composed of nucleotide subunits. Nucleotide: A subunit of DNA or RNA consisting of a nitrogenous base ( adenine, guanine, thymine, or cytosine in DNA; adenine, guanine, uracil, or cytosine in RNA), a phosphate molecule, and a sugar molecule (deoxyribose in DNA and ribose in RNA). Thousands of nucleotides are linked to form a DNA or RNA molecule. See DNA, base pair, RNA. Nucleus: The cellular organelle in eukaryotes that contains the genetic material. Oncogene: A gene, one or more forms of which is associated with cancer. Many oncogenes are involved, directly or indirectly, in controlling the rate of cell growth. Overlapping clones: See genomic library. PCR: See polymerase chain reaction. Phage: A virus for which the natural host is a bacterial cell. Physical map: A map of the locations of identifiable landmarks on DNA (e.g., restriction enzyme cutting sites, genes), regardless of inheritance. Distance is measured in base pairs. For the human genome, the lowest-resolution physical map is the banding patterns on the 24 different chromosomes; the highest- resolution map would be the complete nucleotide sequence of the chromosomes. Plasmid: Autonomously replicating, extrachromosomal circular DNA molecules, distinct from the normal bacterial genome and nonessential for cell survival under nonselective conditions. Some plasmids are capable of integrating into the host genome. A number of artificially constructed plasmids are used as cloning vectors. Polygenic disorders: Genetic disorders resulting from the combined action of alleles of more than one gene (e.g., heart disease, diabetes, and some cancers). Although such disorders are inherited, they depend on the simultaneous presence of several alleles; thus the hereditary patterns are usually more complex than those of single-gene disorders. Compare single-gene disorders. Polymerase chain reaction (PCR): A method for amplifying a DNA base sequence using a heat-stable polymerase and two 20-base primers, one complementary to the (+)-strand at one end of the sequence to be amplified and the other complementary to the (–)-strand at the other end. Because the newly synthesized DNA strands can subsequently serve as additional templates for the same primer sequences, successive rounds of primer annealing, strand elongation, and dissociation produce rapid and highly specific amplification of the desired sequence. PCR also can be used to detect the existence of the defined sequence in a DNA sample. Polymerase, DNA or RNA: Enzymes that catalyze the synthesis of nucleic acids on preexisting nucleic acid templates, assembling RNA from ribonucleotides or DNA from deoxyribonucleotides. Polymorphism: Difference in DNA sequence among individuals. Genetic variations occurring in more than 1% of a population would be considered useful polymorphisms for genetic linkage analysis. Compare mutation. Primer: Short preexisting polynucleotide chain to which new deoxyribonucleotides can be added by DNA polymerase. Probe: Single-stranded DNA or RNA molecules of specific base sequence, labeled either radioactively or immunologically, that are used to detect the complementary base sequence by hybridization. Prokaryote: Cell or organism lacking a membrane-bound, structurally discrete nucleus and other subcellular compartments. Bacteria are prokaryotes. Compare eukaryote. See chromosomes. Promoter: A site on DNA to which RNA polymerase will bind and initiate transcription. Protein: A large molecule composed of one or more chains of amino acids in a specific order; the order is determined by the base sequence of nucleotides in the gene coding for the protein. Proteins are required for the structure, function, and regulation of the body’s cells, tissues, and organs, and each protein has unique functions. Examples are hormones, enzymes, and antibodies. Purine: A nitrogen-containing, single-ring, basic compound that occurs in nucleic acids. The purines in DNA and RNA are adenine and guanine. Pyrimidine: A nitrogen-containing, double-ring, basic compound that occurs in nucleic acids. The pyrimidines in DNA are cytosine and thymine; in RNA, cytosine and uracil. Rare-cutter enzyme: See restriction enzyme cutting site. Recombinant clones: Clones containing recombinant DNA molecules. See recombinant DNA technologies.

28

Recombinant DNA molecules: A combination of DNA molecules of different origin that are joined using recombinant DNA technologies. Recombinant DNA technologies: Procedures used to join together DNA segments in a cell-free system (an environment outside a cell or organism). Under appropriate condi-tions, a recombinant DNA molecule can enter a cell and replicate there, either autonomously or after it has become integrated into a cellular chromosome. Recombination: The process by which progeny derive a combination of genes different from that of either parent. In higher organisms, this can occur by crossing over. Regulatory regions or sequences: A DNA base sequence that controls gene expression. Resolution: Degree of molecular detail on a physical map of DNA, ranging from low to high. Restriction enzyme, endonuclease: A protein that recognizes specific, short nucleotide sequences and cuts DNA at those sites. Bacteria contain over 400 such enzymes that recognize and cut over 100 different DNA sequences. See restriction enzyme cutting site. Restriction enzyme cutting site: A specific nucleotide sequence of DNA at which a particular restriction enzyme cuts the DNA. Some sites occur frequently in DNA (e.g., every several hundred base pairs), others much less frequently ( rare-cutter; e.g., every 10,000 base pairs). Restriction fragment length polymorphism (RFLP): Variation between individuals in DNA fragment sizes cut by specific restriction enzymes; polymorphic sequences that result in RFLPs are used as markers on both physical maps and genetic linkage maps. RFLPs are usually caused by mutation at a cutting site. See marker. RFLP: See restriction fragment length polymorphism. Ribonucleic acid (RNA): A chemical found in the nucleus and cytoplasm of cells; it plays an important role in protein synthesis and other chemical activities of the cell. The structure of RNA is similar to that of DNA. There are several classes of RNA molecules, including messenger RNA, transfer RNA, ribosomal RNA, and other small RNAs, each serving a different purpose. Ribonucleotides: See nucleotide. Ribosomal RNA (rRNA): A class of RNA found in the ribosomes of cells. Ribosomes: Small cellular components composed of specialized ribosomal RNA and protein; site of protein synthesis. See ribonucleic acid ( RNA). RNA: See ribonucleic acid. Sequence : See base sequence. Sequence tagged site (STS): Short (200 to 500 base pairs) DNA sequence that has a single occurrence in the human genome and whose location and base sequence are known. Detectable by polymerase chain reaction, STSs are useful for localizing and orienting the mapping and sequence data reported from many different laboratories and serve as landmarks on the developing physical map of the human genome. Expressed sequence tags (ESTs) are STSs derived from cDNAs. Sequencing: Determination of the order of nucleotides ( base sequences) in a DNA or RNA molecule or the order of amino acids in a protein. Sex chromosomes: The X and Y chromosomes in human beings that determine the sex of an individual. Females have two X chromosomes in diploid cells; males have an X and a Y chromosome. The sex chromosomes comprise the 23rd chromosome pair in a karyotype. Compare autosome. Shotgun method: Cloning of DNA fragments randomly generated from a genome. See library, genomic library. Single-gene disorder: Hereditary disorder caused by a mutant allele of a single gene (e.g., Duchenne muscular dystrophy, retinoblastoma, sickle cell disease). Compare polygenic disorders. SINE: (short interspersed elements) = these are retrotransposon mobile elements that propogate in the host genome by a “copy and paste” mechanism. There are insertions across the genome and are particularly active in dogs. Several canine hereditary diseases result from gene disruption following a SINE insertion. SNP: Single nucleotide polymorphism. A biallelic variation at a specific nucleotide of the genome that is inherited in a Mendelian fashion. Commonly used for mapping purposes. Somatic cells: Any cell in the body except gametes and their precursors. Southern blotting: Transfer by absorption of DNA fragments separated in electrophoretic gels to membrane filters for detection of specific base sequences by radiolabeled complementary probes. STS: See sequence tagged site. Tandem repeat sequences: Multiple copies of the same base sequence on a chromosome; used as a marker in physical mapping.

29

Technology transfer: The process of converting scientific findings from research laboratories into useful products by the commercial sector. Telomere: The ends of chromosomes. These specialized structures are involved in the replication and stability of linear DNA molecules. See DNA replication. Thymine (T): A nitrogenous base, one member of the base pair A-T ( adenine-thymine). Transcription: The synthesis of an RNA copy from a sequence of DNA (a gene); the first step in gene expression. Compare translation. Transfer RNA (tRNA): A class of RNA having structures with triplet nucleotide sequences that are complementary to the triplet nucleotide coding sequences of mRNA. The role of tRNAs in protein synthesis is to bond with amino acids and transfer them to the ribo-somes, where proteins are assembled according to the genetic code carried by mRNA. Transformation: A process by which the genetic material carried by an individual cell is altered by incorporation of exogenous DNA into its genome. Translation: The process in which the genetic code carried by mRNA directs the synthesis of proteins from amino acids. Compare transcription. tRNA: See transfer RNA. Uracil: A nitrogenous base normally found in RNA but not DNA; uracil is capable of forming a base pair with adenine. Vector: See cloning vector. Virus: A noncellular biological entity that can reproduce only within a host cell. Viruses consist of nucleic acid covered by protein; some animal viruses are also surrounded by membrane. Inside the infected cell, the virus uses the synthetic capability of the host to produce progeny virus. VLSI: Very large-scale integration allowing over 100,000 transistors on a chip. YAC: See yeast artificial chromosome. Yeast artificial chromosome (YAC): A vector used to clone DNA fragments (up to 400 kb); it is constructed from the telomeric, centromeric, and replication origin sequences needed for replication in yeast cells. Compare cloning vector, cosmid.

30