Basics in Genetics Analysis

Genetics and Diseases Basics in Genetics Analysis Heping Zhang Environment 9/24/2007 Dr. Doug Brutlag Lecture Syllabus “central paradigm” //www.s-star.org/ 2 Diseases Progression Known and Probable Risk Factors z Being a woman How does the Breast Cancer grows and spread? z Getting older z Having a personal history of BC or ovarian cancer z Having a family history of breast cancer z Having a previous biopsy showing carcinoma in situ z Having your first period before age 12 z Starting menopause after age 55 A malignant Malignant Cancer cells z Never having children tumor within tumors spread z Having your first child after age 30 the breast suppress z Having a mutation in the BRCA1 or BRCA2 genes normal ones z Drinking more than 1 alcoholic drink per day z Being overweight after menopause or gaining weight as an adult. 9/24/2007 3 9/24/2007 4 Genetic Epidemiology Terminology • How is a disease transmitted in families? ( Marker: a known DNA sequence that can be identified by a Inheritance patterns). simple assay e.g., D1S80, D4S43, D16S126 • What is the recurrence Allele: a viable DNA coding that occupies a given locus risk for relatives? (position) on a chromosome e.g., A, a; • Mendelian disorders Genotype: the observed alleles at a genetic locus for an • Autosomal or X-linked individual • Dominant or recessive e.g., AA, Aa, aa; 0 Homozygous: AA, aa 0 Heterozygous: Aa Phenotype: the expression of a particular genotype 0 Continuous: blood pressure 0 Dichotomous: Cancer, Hypertension 9/24/2007 Duke University Center of Human Genetics 5 9/24/2007 6 1 Mendel’s Laws DNA Polymorphism and Human Variation First Law Segregation of Characteristics: the sex cell of a plant or animal may contain one factor (allele) for different traits but not both factors needed to express the traits. Second Law Independent Assortment: For two characteristics, the genes are inherited independently. Third Law Dominants and Recessives: each inherited characteristic is determined by two heredity factors/genes, one from each parent which determine whether a gene will be dominant or recessive. 9/24/2007 7 9/24/2007 8 DNA Polymorphism RFLP Restriction Fragment Length A technique discovered in 1975 in which organisms may be differentiated by analysis of patterns derived from cleavage Polymorphism of their DNA Variable Number of Tandem Repeats Only two alleles: present or not present 0Minisatellites 0Microsatellites Single Nucleotide Polymorphism 0Single-base substitutions 0Single-base insertion/deletions 9/24/2007 9 9/24/2007 http://www.people.virginia.edu/~rjh9u/hbsrflp.html 10 VNTR SNP Successively repeating blocks of oligonucleotide of variable lengths (Nakamura, et al. 1987) Two families of VNTR Minisatellite A DNA sequence variation Sequences of 11-16 bp repeated 1000 times. occurring when a single Highly repetitive and dispersed into the genome nucleotide - A, T, C, or G - in Microsatellite (Short Tandem Repeat) the genome (or other shared Short sequences of 100-200 bp given by the repetition of 1-6 bp sequence) differs between sequences members of a species (or between paired chromosomes in an individual) 9/24/2007 http://www.people.virginia.edu/~rjh9u/vntr1.html 11 9/24/2007 http://en.wikipedia.org/wiki/Single_nucleotide_polymorphism 12 2 Characteristics of SNP Approaches: Linkage and Association The most common genetic polymorphism Linkage studies use individual families where members are affected and attempt to demonstrate Distributed throughout genome with high density linkage between the occurrence of the disease More stable and easy to assay and genetic markers (creates associations within Major cause of genetic diversity among different families, but not among unrelated people) (normal) individuals Association studies are based on populations and Facilitates large scale genetic association studies as attempt to show an association between a genetic markers. particular allele and susceptibility to disease (a statistical statement about the co-occurrence of Most of SNPs neither change protein synthesis nor alleles and phenotypes) cause disease directly Serve as landmarks: may be physically close to the mutation site on the chromosome Shared among groups of people with common characteristics 9/24/2007 13 9/24/2007 14 Linkage Studies Pedigree Goal To obtain a crude chromosomal location of the gene or genes associated with a phenotype of interest, e.g. a genetic disease or an important quantitative traits. Co-inheritance of Disease and Marker Genes in Families Mendel’s 2nd Law states that disease genes and genetic markers are inherited independently. However, for a marker in close proximity to a disease locus, their genes may go together in family pedigrees (creating genetic linkage), only occasionally interrupted by “crossing- over. 9/24/2007 15 9/24/2007 16 Recombination Recombination Random segregation of chromatids Crossover between homologous pairs of chromosomes 2 x 23 human chromosomes => 223 possible haploid combinations Genes on the same chromosome recombine with probability θ depending on their distance and Genes on different chromosomes location on the chromosome recombine with probability θ = .5 9/24/2007 www.gen.umn.edu/faculty_staff/hatch/1131/ 17 9/24/2007 18 3 LOD (Log Ratio of Odds) Scores Linkage Distance between Genes Tests a genetic model which states that the two Recombination A O B O Lod Score markers (or a disease locus and a genetic marker) Frequency D d d d are linked with a recombination fraction of θ and 0.050 0.951 which requires that parameters are specified: A B O O D d d d z Dominant or recessive 0.100 1.088 z Degree of penetrance z Allele frequencies 0.125 1.099 LOD score : 0.150 1.090 Likelihood of two markers are linked log 0.200 1.031 Likelihood of two markers are unlinked A O A O D d d d 0.250 0.932 9/24/2007 19 9/24/2007 20 Model-based Multipoint Linkage Analysis Model-based Multipoint Linkage Analysis If we have two markers tightly linked to the initial marker (no recombination among them), D+ ++ then there is no recombination No parent is doubly 11 22 +D ++ 11 12 in any of the six offspring. The 11 12 heterogeneous and 11 22 likelihood ratio for it is hence the pedigree is Z(θ ) = 6log(1−θ) − 6log(0.5) not informative. +D ++ The max LOD =1.8, reached LOD=0. 21 22 when θ = 0. +D ++ 11 22 11 22 21 22 D+ D+ D+ ++ D+ ++ 12 12 12 22 12 22 +D +D +D ++ +D ++ 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 22 12 22 9/24/2007 21 9/24/2007 22 Model-free Linkage Analysis Sib Pair Studies Identity by Descent (IBD): Two individuals are For each sib pair, count the number of alleles said to share a gene identity by descent if one shared IBD. gene is a directly-inherited copy of the other or Average the counts over all sib pairs. if they are both directly-inherited copies of a common ancestral gene. Estimate the expectation and variable under the null hypothesis of no linkage. Identify by State (IBS): Two individuals are said to share an allele identity by state if they both Standardize the statistic and compare it with have that allele (not necessarily inherited from the standard normal distribution. the same ancestral). Perform a one-sided test. Idea of model-free linkage analysis Test the allele sharing for each affected relative pair (or larger group of affected relatives) against the expected sharing in the absence of linkage. The usual measure of allele sharing is IBD 9/24/2007 23 9/24/2007 24 4 Linkage Equilibrium Linkage Disequilibrium (LD) Risch and Merikangas (1996 Science) pointed out that Genes which are not in linkage equilibrium for genes of modest effect, strategies employing linkage disequilibrium (LD) would be more powerful than Genes which occur more often than traditional linkage analysis. expected from the law of independent assortment State of random association between alleles at different Non-random association in a population of markers Gamete freq LE alleles at two closely linked loci based on Locus 1 A--- p1 having a common ancestor AB p11 p1q1 a--- p2 Ab p12 p1q2 Locus 2 B --- q1 aB p21 p2q1 b --- q2 ab p22 p2q2 9/24/2007 25 9/24/2007 26 How Does LD Occur How Is LD Measured? Alleles that are closely linked will be commonly inherited D, Dmax and D’ But, in time, disequilibrium will disappear due to D = fAB-fAfB = fabfAB -fAbfaB recombination (i.e. Allele frequencies will equalize). If two alleles 1 Mb apart are in disequilibrium, then in 70 D = 0 LE; D < 0 or > 0 LD generations the disequilibrium will decay by 50% Dmax = min(fAfb, fafB), when D > 0 Any new mutation (allele) will occur on a specific chromosome and the mutated allele will be associated with = max(- fAfB, - fafb), when D < 0 the alleles present at all loci on that chromosome D'=D/Dmax With more meiotic events, recombination between loci causes decay of LD and the alleles return to equilibrium 0≤|D’| ≤1 The decay takes longer for alleles closely linked due to less D’ is the measure of the strength of LD chance of recombination 9/24/2007 27 9/24/2007 28 Linkage vs. LD Linkage vs. LD 1. Linkage focuses on a locus, LD on an allele 2. Linkage is resulted from recombination events in the last 2-3 generations, LD from much earlier, ancestral recombination events 3. Linkage measures co-segregation in a pedigree, LD in a population (essentially a huge huge pedigree) All loci are “linked” to the unobserved disease allele 4.

Basics in Genetics Analysis

Genetic, Epidemiological and Biological Analysis of Interleukin-10

PLINK: a Toolset for Whole Genome Association and Population-Based Linkage Analyses

Genetic Association Tests for Binary Traits with An

Design and Analysis of Genetic Association Studies

Genome-Wide Association Studies: Understanding the Genetics of Common Disease the Academy of Medical Sciences | FORUM

Study of Genetic Association

Genetic Association Analysis of SARS-Cov-2 Infection in 455,838 UK Biobank Participants

Characterization of Quantitative Traits Using Association Genetics

[Thesis Title Goes Here]

Geographic Confounding in Genome-Wide Association Studies

A Tutorial on Statistical Methods for Population Association Studies

Case-Control Association Tests Correcting for Population Stratification