Lecture 1
Introduction I: Pedigrees, genetics and probabilities
Magnus Dehli Vigeland
Statistical methods in genetic relatedness and pedigree analysis NORBIS course, 6th – 10th of January 2020, Oslo Outline • Part I: Pedigrees – Pedigree symbols and terminology – Some common relationships
• Part II: Genetics – Terminology . Locus, allele, genotype, marker – Mendelian inheritance . Autosomal, X, Y
• Part III: Pedigree likelihoods – Motivation: Real-life problems – Ingredients: . Hardy-Weinberg equilibrium . Mendelian transition probabilities – Likelihoods by hand – Computer algorithms
Statistical methods in genetic relatedness and pedigree analysis Outline • Part I: Pedigrees – Pedigree symbols and terminology – Some common relationships
• Part II: Genetics – Terminology . Locus, allele, genotype, marker – Mendelian inheritance . Autosomal, X, Y
• Part III: Pedigree likelihoods – Motivation: Real-life problems – Ingredients: . Hardy-Weinberg equilibrium . Mendelian transition probabilities – Likelihoods by hand – Computer algorithms
Statistical methods in genetic relatedness and pedigree analysis Founders: Pedigrees: Symbols and terminology No parents included in the pedigree
= male = female
Nonfounders
Statistical methods in genetic relatedness and pedigree analysis Pedigrees: Symbols and terminology
= male = female
Consanguineous marriage
Statistical methods in genetic relatedness and pedigree analysis Pedigrees: Symbols and terminology
●
●
Medical pedigrees: ● ● = affected = unaffected
● = carrier of disease allele
Statistical methods in genetic relatedness and pedigree analysis Alternative ways of drawing pedigrees
1
3
5
Standard Simplified Directed acyclic graph
Statistical methods in genetic relatedness and pedigree analysis Some common relationships
(and some less common...)
Statistical methods in genetic relatedness and pedigree analysis Cousin relationships
Full siblings
First cousins
Second cousins
Statistical methods in genetic relatedness and pedigree analysis Cousin relationships
First cousins once removed
Statistical methods in genetic relatedness and pedigree analysis Cousin relationships
Aunt - nephew
Statistical methods in genetic relatedness and pedigree analysis Cousin relationships
Grandaunt
Statistical methods in genetic relatedness and pedigree analysis Half cousin relationships
Half siblings (paternal)
Half first cousins
Half second cousins
Statistical methods in genetic relatedness and pedigree analysis Half cousin relationships
Half aunt / half nephew
Statistical methods in genetic relatedness and pedigree analysis Half cousin relationships
Statistical methods in genetic relatedness and pedigree analysis More complicated relationships
3/4 siblings
Statistical methods in genetic relatedness and pedigree analysis What about this?
Double first cousins The connoisseur's favourite!
Quadruple half first cousins! Outline • Part I: Pedigrees – Pedigree symbols and terminology – Some common relationships
• Part II: Genetics – Terminology . Locus, allele, genotype, marker – Mendelian inheritance . Autosomal, X, Y
• Part III: Pedigree likelihoods – Motivation: Real-life problems – Ingredients: . Hardy-Weinberg equilibrium . Mendelian transition probabilities – Likelihoods by hand – Computer algorithms
Statistical methods in genetic relatedness and pedigree analysis Genetics
• Human genome: – Diploid – 22 pairs of autosomes – Sex chroms: X and Y
• Some important terms – Locus – Allele – Genotype – Genetic markers . SNPs . microsatellites
Statistical methods in genetic relatedness and pedigree analysis Locus, allele, genotype
M F alleles
A B locus
genotype: A/B
Homologous chromosomes
• LOCUS = a specific place in the genome, e.g. a base pair, a gene or a region
• ALLELE = any of the alternative forms of a locus
• GENOTYPE = the set of alleles carried by an individual at a given locus
Statistical methods in genetic relatedness and pedigree analysis Genetic markers
• Small parts of the genome which ... – have known position – vary in the population – are easy to genotype
• SNPs (single nucleotide polymorphisms) ...CCGTTATATGGGC...
– two alleles = minor allele frequency ...CCGTTAGATGGGC... – usual requirement: MAF > 1% ...CCGTTATATGGGC... – very common in the genome (millions!) ...CCGTTATATGGGC... – used in medical genetics +++ ...CCGTTAGATGGGC...
• STRs (short tandem repeats) = microsatellites – consecutive repeats of 2-5 bases ...ACG TTAG TTAG TTAG TTAG AAC.. – multiallelic: 5 - 50 alleles ...ACG TTAG TTAG AAC.. – allele names: # repeats ...ACG TTAG TTAG TTAG TTAG TTAG AAC.. – used in forensics
Statistical methods in genetic relatedness and pedigree analysis Outline
• Part I: Brief introductions – Pedigrees symbols and terminology – Some common relationships – Genetics . Locus, allele, genotype, marker – Mendelian inheritance . Autosomal . X, Y
• Part II: Pedigree likelihoods – Motivation: Real-life problems – Ingredients: . Hardy-Weinberg equilibrium . Mendelian transition probabilities – Likelihoods by hand – Computer algorithms
Statistical methods in genetic relatedness and pedigree analysis Mendelian inheritance: Autosomal (chromosomes 1-22)
Example: autosomal marker with 3 alleles: A, B, C
homozygous A/A B/C heterozygous
A/B A/C A/B
B/C Probability of transmitting either allele: always 50%
Statistical methods in genetic relatedness and pedigree analysis Mendelian inheritance: X-linked
Example: X-linked marker with 3 alleles: A, B, C
males are A B/C hemizygous
A/C A/B C
A no transmisison forced transmission from father to son from father to daughter
Statistical methods in genetic relatedness and pedigree analysis Mendelian inheritance: Y-linked
Example: Y-linked marker with 2 alleles: A, B
A
B • no transmission involving females B
• father-son forced
Statistical methods in genetic relatedness and pedigree analysis Assumptions throughout (most of) this course
• Diploid species • No cytogenetic abnormalitites • No de novo mutations
COFFEE BREAK!
Statistical methods in genetic relatedness and pedigree analysis Outline • Part I: Pedigrees – Pedigree symbols and terminology – Some common relationships
• Part II: Genetics – Terminology . Locus, allele, genotype, marker – Mendelian inheritance . Autosomal, X, Y
• Part III: Pedigree likelihoods – Motivation: Real-life problems – Ingredients: . Hardy-Weinberg equilibrium . Mendelian transition probabilities – Likelihoods by hand – Computer algorithms
Statistical methods in genetic relatedness and pedigree analysis Questions related to pedigrees with genotypes
• Will my child have the disease?
• Is NN the true father?
• Brothers or half brothers?
• Is NN related to this family? How?
• Predict the missing genotype?
Statistical methods in genetic relatedness and pedigree analysis Questions related to pedigrees with genotypes
D/N
Disease locus: ? D alleles D and N
Will my child have the disease?
Statistical methods in genetic relatedness and pedigree analysis Questions related to pedigrees with genotypes
1 2
11/13 -/- 13/18
11/18 Suppose: • 11 is common • 18 is rare
Who is the true father?
Statistical methods in genetic relatedness and pedigree analysis Questions related to pedigrees with genotypes
Brothers or half brothers?
Statistical methods in genetic relatedness and pedigree analysis Questions related to pedigrees with genotypes
12/14 32/40 7/11 6/21
11/14 32/40 13/13 6/25 12/16 34/40 7/7 12/21 11/16 32/41 7/13 6/25
Is this woman related to the family?
Statistical methods in genetic relatedness and pedigree analysis Questions related to pedigrees with genotypes
A/B A/A
A/A ?/? A/B
A/B
Can we predict the missing genotype?
Statistical methods in genetic relatedness and pedigree analysis • Common to all of these: The need to calculate probabilities
P( genotypes | pedigree, marker info, allele freqs, .. )
• Called the likelihood of the pedigree.
Statistical methods in genetic relatedness and pedigree analysis Ingredients for likelihood computations
founder probabilities
A/B A/A
A/A -/- A/A transition probabilities A/B
untyped individuals
Statistical methods in genetic relatedness and pedigree analysis Ingredient 1: Founder probabilities
• Suppose the allele frequencies are:
푃 퐴 = 푝 푃 퐵 = 푞
• What are the frequencies of the genotypes AA, AB, BB? • Under certain assumptions, the alleles can be treated as independent:
푃(퐴퐴) = 푃 퐴 ∗ 푃 퐴 = 푝2
푃 퐵퐵 = 푃 퐵 ∗ 푃 퐵 = 푞2
푃 퐴퐵 = 푃 퐴퐵 표푟 퐵퐴 = 푝푞 + 푞푝 = 2푝푞
two possible orderings!
Statistical methods in genetic relatedness and pedigree analysis The Hardy-Weinberg principle
Assumptions: A 푝 B 푞 • infinite population • random mating ∗ ∗ ∗ • no selection AA AB BB • no mutations • no migration A 푝 B 푞 Hardy (1908): Shows 2 2 «... using a little mathemathics of the multiplication table kind»: 푝 2푝푞 푞 AA AB BB • allele freqs are unchanged from generation to generation
• after 1 generation the genotype freqs stay unchanged A 푝 B 푞
2 2 푃 퐴퐴 = 푝2 푝 2푝푞 푞 푃 퐴퐵 = 2푝푞 HW equilibrium AA AB BB 푃(퐵퐵) = 푞2
Statistical methods in genetic relatedness and pedigree analysis 2 푝퐴퐴 = 푝 푝퐴퐵 = 2푝푞 2 푝퐵퐵 = 푞
assuming HWE
Allele Genotype frequencies frequencies
always
푝 = 푝퐴퐴 + 0.5 푝퐴퐵 푞 = 푝퐵퐵 + 0.5 푝퐴퐵
Statistical methods in genetic relatedness and pedigree analysis Ingredient 2: Transition probabilities P(gchild | gparents)
• Easy - follows directly from Mendel's laws!
child A/A AB BB parents AA×AA 1 0 0 AA×AB 0.5 0.5 0 A/A A/B AA×BB 0 1 0 AB×AA 0.5 0.5 0 AB×AB 0.25 0.5 0.25 -/- AB×BB 0 0.5 0.5 BB×AA 0 1 0 BB×AB 0 0.5 0.5 BB×BB 0 0 1
Statistical methods in genetic relatedness and pedigree analysis Example
1 2 A/A A/B
3 A/B
L = 푃 푔1, 푔2, 푔3
= 푃 푔1) ∙ 푃(푔2) ∙ 푃(푔3 | 푔1, 푔2
= 푃 퐴퐴 ∙ 푃 퐴퐵 ∙ 푃 퐴퐵 parents = 퐴퐴 × 퐴퐵)
= 푝2 ∙ 2푝푞 ∙ 0.5
= 푝3푞
assuming HWE!
Statistical methods in genetic relatedness and pedigree analysis Example on X
A 1 2 A/B
3 4 5 B/B
A/B B
6
B/B
퐿 = 푃 푔푒푛표푡푦푝푒푠 | 푝푒푑푖푔푟푒푒, 푝, 푞 1 2 3 4 5 6 contribution from each individual = 푝 ∙ 2푝푞 ∙ 0.5 ∙ 0.5 ∙ 푞2 ∙ 1
= 0.5 푝2 푞3
Statistical methods in genetic relatedness and pedigree analysis Ingredient 3: How to deal with untyped individuals
Solution: Sum of all possible genotypes for the untyped
1 2
A/A -/-
3
A/B
푃 푔1, 푔3 = 푃(푔1, 푔2, 푔3) = 푃 푔1) ∙ 푃(푔2) ∙ 푃(푔3|푔1, 푔2 푔2 푔2
= 푃(퐴퐴) ∙ 푃(퐴퐴) ∙ 푃 퐴퐵 퐴퐴 × 퐴퐴) + 푃(퐴퐴) ∙ 푃(퐴퐵) ∙ 푃 퐴퐵 퐴퐴 × 퐴퐵) + 푃(퐴퐴) ∙ 푃(퐵퐵) ∙ 푃 퐴퐵 퐴퐴 × 퐵퐵)
= 푝2 ∙ 푝2 ∙ 0 + 푝2 ∙ 2푝푞 ∙ 0.5 + 푝2∙ 푞2 ∙ 1
= 푝3푞 + 푝2푞2 = 푝2푞 푝 + 푞 = 푝2푞
Statistical methods in genetic relatedness and pedigree analysis Pedigree likelihood: General formula
• Given: – pedigree with n individuals
– k members are genotyped: g1, g2, …, gk
founders non-founders • Then:
푃 푔1, ..., 푔푘 = … 푃 푔1 ⋯ 푃 푔푗 ∙ 푃 푔푗+1 푝푎푟 ⋯ 푃 푔푛 푝푎푟 퐺1 퐺2 퐺푛
Gi = set of possible • If everyone is typed: Only one term → easy genotypes for individual i
• Number of terms grows exponentially in #(untyped) – but clever algorithms exist!
Statistical methods in genetic relatedness and pedigree analysis Computer algorithms for pedigree likelihoods
• Elston-Stewart algorithm A/B A/A – a peeling algorithm – linear in pedigree size!
A/A -/- A/A
A/B • Lander-Green – based on inheritance vectors – hidden Markov model – best choice with many linked markers – small/medium pedigrees only
Statistical methods in genetic relatedness and pedigree analysis Software
• R/pedprobr – Part of the ped suite – Elston-Stewart – general likelihoods, inbreeding, genotype distributions ++ • Familias – GUI for forensic applications – Elston-Stewart – handles mutations, HW deviations, ++ • MERLIN – command line program – Lander-Green – medical applications: multipoint linkage
Statistical methods in genetic relatedness and pedigree analysis